Knowledge Graph Construction Pipeline¶

This notebook demonstrates the complete process of building a knowledge graph from raw text data. The pipeline includes:

  1. Text collection through web scraping
  2. Text preprocessing and cleaning
  3. Named Entity Recognition (NER)
  4. Relation Extraction (RE)
  5. Knowledge Graph Construction
  6. Visualization and querying

Setup¶

First, let's import the necessary libraries and set up the environment.

In [1]:
# Import standard libraries
import os
import sys
import json
import logging
from pprint import pprint
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
In [2]:
# Add project root to path for importing local modules
# Adjust this path if needed
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Import project modules
from src.data_collection.scraper import NewsArticleScraper
from src.preprocessing.cleaner import clean_text
from src.entity_recognition.ner import SpacyNERExtractor, CRFExtractor
from src.entity_recognition.comparison import NERComparison
from src.relation_extraction.extractor import SpacyRelationExtractor
from src.knowledge_graph.builder import KnowledgeGraphBuilder

# Create output directories
os.makedirs('output/data', exist_ok=True)
os.makedirs('output/models', exist_ok=True)
os.makedirs('output/visualization', exist_ok=True)
2025-03-29 16:27:11,432 - datasets - INFO - PyTorch version 2.6.0 available.
2025-03-29 16:27:11,435 - datasets - INFO - TensorFlow version 2.18.0 available.

1. Data Collection¶

Let's collect news articles from Reuters using our web scraper.

In [3]:
# Initialize scraper
scraper = NewsArticleScraper(output_dir='output/data/raw')

# Scrape articles
# Note: This might take a while and might be rate-limited by the website
# For demonstration, we can scrape a smaller number of articles
# We can also use example data if scraping fails
try:
    article_files = scraper.scrape_reuters(num_articles=15, category='business')
    print(f"Scraped {len(article_files)} articles:")
    for file in article_files:
        print(f"- {os.path.basename(file)}")
except Exception as e:
    logger.error(f"Error scraping articles: {e}")
    # Use example data if scraping fails
    logger.info("Using example data instead")
    # Create a simple example article
    example_article = {
        "id": "example1",
        "title": "Apple announces new partnership with Microsoft",
        "url": "https://example.com/article1",
        "source": "example",
        "category": "business",
        "published_date": "2023-01-01",
        "scraped_date": "2023-01-02",
        "content": "Apple Inc. has announced a new partnership with Microsoft Corporation, according to CEO Tim Cook. \
The collaboration will focus on cloud computing services and AI integration. \
The partnership was revealed at a press conference in Cupertino, California yesterday. \
Microsoft CEO Satya Nadella expressed excitement about working with the iPhone maker. \
Apple was founded by Steve Jobs in 1976 and has become one of the world's most valuable companies."
    }
    
    # Save example article
    os.makedirs('output/data/raw', exist_ok=True)
    example_file = 'output/data/raw/example_article.json'
    with open(example_file, 'w', encoding='utf-8') as f:
        json.dump(example_article, f, ensure_ascii=False, indent=4)
    
    article_files = [example_file]
2025-03-29 16:27:13,205 - src.data_collection.scraper - INFO - Output directory set to: output/data/raw
2025-03-29 16:27:13,206 - src.data_collection.scraper - INFO - Scraping 15 articles from Reuters/business
2025-03-29 16:27:13,207 - WDM - INFO - ====== WebDriver manager ======
2025-03-29 16:27:13,965 - WDM - INFO - Get LATEST chromedriver version for google-chrome
2025-03-29 16:27:13,998 - WDM - INFO - Get LATEST chromedriver version for google-chrome
2025-03-29 16:27:14,015 - WDM - INFO - Get LATEST chromedriver version for google-chrome
2025-03-29 16:27:14,068 - WDM - INFO - WebDriver version 134.0.6998.165 selected
2025-03-29 16:27:14,075 - WDM - INFO - Modern chrome version https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.165/win32/chromedriver-win32.zip
2025-03-29 16:27:14,076 - WDM - INFO - About to download new driver from https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.165/win32/chromedriver-win32.zip
2025-03-29 16:27:14,103 - WDM - INFO - Driver downloading response is 200
2025-03-29 16:27:14,857 - WDM - INFO - Get LATEST chromedriver version for google-chrome
2025-03-29 16:27:15,153 - WDM - INFO - Driver has been saved in cache [C:\Users\hbonn\.wdm\drivers\chromedriver\win64\134.0.6998.165]
2025-03-29 16:27:16,606 - src.data_collection.scraper - INFO - Navigating to https://www.reuters.com/business/
2025-03-29 16:27:20,132 - src.data_collection.scraper - INFO - FOUND ARTICLES ELEMENTS : 20
2025-03-29 16:27:22,133 - src.data_collection.scraper - INFO - Dealing with article : 1
2025-03-29 16:27:27,202 - src.data_collection.scraper - WARNING - Invalid article link: None
2025-03-29 16:27:27,203 - src.data_collection.scraper - INFO - Dealing with article : 2
2025-03-29 16:27:32,216 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/world/us-warns-french-companies-they-must-comply-with-trumps-diversity-ban-2025-03-29/
2025-03-29 16:27:32,217 - src.data_collection.scraper - INFO - Dealing with article : 3
2025-03-29 16:27:37,226 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/markets/deals/musks-xai-buys-social-media-platform-x-45-billion-2025-03-28/
2025-03-29 16:27:37,227 - src.data_collection.scraper - INFO - Dealing with article : 4
2025-03-29 16:27:42,236 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/markets/deals/unicredit-gets-ecb-approval-banco-bpm-buy-weigh-options-2025-03-29/
2025-03-29 16:27:42,236 - src.data_collection.scraper - INFO - Dealing with article : 5
2025-03-29 16:27:47,246 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/markets/deals/chinese-state-media-likens-ck-hutchison-panama-port-deal-handing-knife-opponent-2025-03-29/
2025-03-29 16:27:47,246 - src.data_collection.scraper - INFO - Dealing with article : 6
2025-03-29 16:27:52,257 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/world/us/trump-presses-advisers-tariff-escalation-ahead-april-2-washington-post-reports-2025-03-29/
2025-03-29 16:27:52,258 - src.data_collection.scraper - INFO - Dealing with article : 7
2025-03-29 16:27:57,269 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/world/india-us-making-progress-towards-trade-deal-officials-say-2025-03-29/
2025-03-29 16:27:57,269 - src.data_collection.scraper - INFO - Dealing with article : 8
2025-03-29 16:28:02,278 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/sustainability/climate-energy/stellantis-buy-co2-credits-tesla-pool-also-2025-exec-says-2025-03-29/
2025-03-29 16:28:02,279 - src.data_collection.scraper - INFO - Dealing with article : 9
2025-03-29 16:28:07,288 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/markets/deals/nvidia-backed-coreweaves-shares-likely-open-up-25-above-ipo-price-2025-03-28/
2025-03-29 16:28:07,289 - src.data_collection.scraper - INFO - Dealing with article : 10
2025-03-29 16:28:12,299 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/world/us/trump-commutes-ozy-media-founder-watsons-nearly-10-year-sentence-2025-03-29/
2025-03-29 16:28:12,299 - src.data_collection.scraper - INFO - Dealing with article : 11
2025-03-29 16:28:17,311 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/business/finance/glass-lewis-recommends-votes-against-ceo-pay-goldman-sachs-2025-03-29/
2025-03-29 16:28:17,312 - src.data_collection.scraper - INFO - Dealing with article : 12
2025-03-29 16:28:22,322 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/markets/deals/blackstone-evaluates-taking-stake-us-tiktok-spinoff-2025-03-28/
2025-03-29 16:28:22,322 - src.data_collection.scraper - INFO - Dealing with article : 13
2025-03-29 16:28:27,336 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/markets/us/wall-st-futures-edge-lower-tariff-woes-inflation-data-tap-2025-03-28/
2025-03-29 16:28:27,336 - src.data_collection.scraper - INFO - Dealing with article : 14
2025-03-29 16:28:32,347 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/technology/artificial-intelligence/openai-must-complete-for-profit-transition-by-year-end-raise-full-40-billion-2025-03-28/
2025-03-29 16:28:32,347 - src.data_collection.scraper - INFO - Dealing with article : 15
2025-03-29 16:28:37,357 - src.data_collection.scraper - INFO - Article link added: https://www.reuters.com/technology/artificial-intelligence/scale-ai-seeking-valuation-high-25-billion-potential-tender-offer-business-2025-03-28/
2025-03-29 16:28:37,358 - src.data_collection.scraper - INFO - Found 14 article links
2025-03-29 16:28:39,460 - src.data_collection.scraper - INFO - Processing article 1/14: https://www.reuters.com/world/india-us-making-progress-towards-trade-deal-officials-say-2025-03-29/
2025-03-29 16:29:15,547 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:29:15,562 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T14:50:38Z
2025-03-29 16:29:15,638 - src.data_collection.scraper - INFO - Content found: NEW DELHI, March 29 (Reuters) - Indian and U.S. of...
2025-03-29 16:29:15,639 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_162915_.json
2025-03-29 16:29:38,429 - src.data_collection.scraper - INFO - Processing article 2/14: https://www.reuters.com/business/finance/glass-lewis-recommends-votes-against-ceo-pay-goldman-sachs-2025-03-29/
2025-03-29 16:29:56,282 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:29:56,291 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T14:12:13Z
2025-03-29 16:29:56,332 - src.data_collection.scraper - INFO - Content found: March 29 (Reuters) - Proxy adviser Glass Lewis rec...
2025-03-29 16:29:56,333 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_162956_.json
2025-03-29 16:30:26,297 - src.data_collection.scraper - INFO - Processing article 3/14: https://www.reuters.com/markets/deals/blackstone-evaluates-taking-stake-us-tiktok-spinoff-2025-03-28/
2025-03-29 16:30:42,744 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:30:42,753 - src.data_collection.scraper - INFO - Publication date found: 2025-03-28T21:37:58Z
2025-03-29 16:30:42,817 - src.data_collection.scraper - INFO - Content found: March 28 (Reuters) - Private equity firm Blackston...
2025-03-29 16:30:42,818 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163042_.json
2025-03-29 16:31:13,388 - src.data_collection.scraper - INFO - Processing article 4/14: https://www.reuters.com/markets/deals/unicredit-gets-ecb-approval-banco-bpm-buy-weigh-options-2025-03-29/
2025-03-29 16:31:31,058 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:31:31,066 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T09:26:30Z
2025-03-29 16:31:31,137 - src.data_collection.scraper - INFO - Content found: MILAN, March 29 (Reuters) - UniCredit (CRDI.MI)
, ...
2025-03-29 16:31:31,137 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163131_.json
2025-03-29 16:32:02,793 - src.data_collection.scraper - INFO - Processing article 5/14: https://www.reuters.com/world/us/trump-presses-advisers-tariff-escalation-ahead-april-2-washington-post-reports-2025-03-29/
2025-03-29 16:32:18,972 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:32:18,981 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T14:51:59Z
2025-03-29 16:32:19,035 - src.data_collection.scraper - INFO - Content found: March 29 (Reuters) - U.S. President Donald Trump i...
2025-03-29 16:32:19,036 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163219_.json
2025-03-29 16:32:45,541 - src.data_collection.scraper - INFO - Processing article 6/14: https://www.reuters.com/markets/deals/chinese-state-media-likens-ck-hutchison-panama-port-deal-handing-knife-opponent-2025-03-29/
2025-03-29 16:33:04,933 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:33:04,941 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T10:40:44Z
2025-03-29 16:33:04,985 - src.data_collection.scraper - INFO - Content found: BEIJING, March 29 (Reuters) - Chinese state media ...
2025-03-29 16:33:04,987 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163304_.json
2025-03-29 16:33:22,635 - src.data_collection.scraper - INFO - Processing article 7/14: https://www.reuters.com/sustainability/climate-energy/stellantis-buy-co2-credits-tesla-pool-also-2025-exec-says-2025-03-29/
2025-03-29 16:33:39,620 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:33:39,629 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T13:15:23Z
2025-03-29 16:33:39,687 - src.data_collection.scraper - INFO - Content found: TURIN, Italy, March 29 (Reuters) - Stellantis will...
2025-03-29 16:33:39,688 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163339_.json
2025-03-29 16:34:02,486 - src.data_collection.scraper - INFO - Processing article 8/14: https://www.reuters.com/technology/artificial-intelligence/scale-ai-seeking-valuation-high-25-billion-potential-tender-offer-business-2025-03-28/
2025-03-29 16:34:17,011 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:34:17,021 - src.data_collection.scraper - INFO - Publication date found: 2025-03-28T23:42:00Z
2025-03-29 16:34:17,078 - src.data_collection.scraper - INFO - Content found: March 28 (Reuters) - Artificial intelligence start...
2025-03-29 16:34:17,080 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163417_.json
2025-03-29 16:34:45,882 - src.data_collection.scraper - INFO - Processing article 9/14: https://www.reuters.com/world/us/trump-commutes-ozy-media-founder-watsons-nearly-10-year-sentence-2025-03-29/
2025-03-29 16:34:57,751 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:34:57,760 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T14:16:27Z
2025-03-29 16:34:57,808 - src.data_collection.scraper - INFO - Content found: March 28 (Reuters) - U.S. President Donald Trump h...
2025-03-29 16:34:57,810 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163457_.json
2025-03-29 16:35:27,944 - src.data_collection.scraper - INFO - Processing article 10/14: https://www.reuters.com/markets/us/wall-st-futures-edge-lower-tariff-woes-inflation-data-tap-2025-03-28/
2025-03-29 16:35:43,070 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:35:43,079 - src.data_collection.scraper - INFO - Publication date found: 2025-03-28T23:50:13Z
2025-03-29 16:35:43,239 - src.data_collection.scraper - INFO - Content found: March 28 (Reuters) - Wall Street stocks ended shar...
2025-03-29 16:35:43,241 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163543_.json
2025-03-29 16:36:10,817 - src.data_collection.scraper - INFO - Processing article 11/14: https://www.reuters.com/world/us-warns-french-companies-they-must-comply-with-trumps-diversity-ban-2025-03-29/
2025-03-29 16:36:25,114 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:36:25,123 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T11:20:42Z
2025-03-29 16:36:25,212 - src.data_collection.scraper - INFO - Content found: PARIS, March 29 (Reuters) - The Trump administrati...
2025-03-29 16:36:25,213 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163625_.json
2025-03-29 16:36:53,730 - src.data_collection.scraper - INFO - Processing article 12/14: https://www.reuters.com/markets/deals/musks-xai-buys-social-media-platform-x-45-billion-2025-03-28/
2025-03-29 16:37:07,072 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:37:07,081 - src.data_collection.scraper - INFO - Publication date found: 2025-03-29T05:25:28Z
2025-03-29 16:37:07,204 - src.data_collection.scraper - INFO - Content found: March 28 (Reuters) - Elon Musk's xAI has acquired ...
2025-03-29 16:37:07,205 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163707_.json
2025-03-29 16:37:36,660 - src.data_collection.scraper - INFO - Processing article 13/14: https://www.reuters.com/technology/artificial-intelligence/openai-must-complete-for-profit-transition-by-year-end-raise-full-40-billion-2025-03-28/
2025-03-29 16:37:50,930 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:37:50,938 - src.data_collection.scraper - INFO - Publication date found: 2025-03-28T21:38:50Z
2025-03-29 16:37:50,978 - src.data_collection.scraper - INFO - Content found: March 28 (Reuters) - OpenAI must transition to a f...
2025-03-29 16:37:50,980 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163750_.json
2025-03-29 16:38:15,401 - src.data_collection.scraper - INFO - Processing article 14/14: https://www.reuters.com/markets/deals/nvidia-backed-coreweaves-shares-likely-open-up-25-above-ipo-price-2025-03-28/
2025-03-29 16:38:33,404 - src.data_collection.scraper - INFO - Title found: 
2025-03-29 16:38:33,414 - src.data_collection.scraper - INFO - Publication date found: 2025-03-28T22:29:57Z
2025-03-29 16:38:33,538 - src.data_collection.scraper - INFO - Content found: March 28 (Reuters) - CoreWeave's shares closed fla...
2025-03-29 16:38:33,539 - src.data_collection.scraper - INFO - Saved article to output/data/raw\reuters_20250329_163833_.json
2025-03-29 16:38:55,901 - src.data_collection.scraper - INFO - Scraped 14 articles from Reuters
Scraped 14 articles:
- reuters_20250329_162915_.json
- reuters_20250329_162956_.json
- reuters_20250329_163042_.json
- reuters_20250329_163131_.json
- reuters_20250329_163219_.json
- reuters_20250329_163304_.json
- reuters_20250329_163339_.json
- reuters_20250329_163417_.json
- reuters_20250329_163457_.json
- reuters_20250329_163543_.json
- reuters_20250329_163625_.json
- reuters_20250329_163707_.json
- reuters_20250329_163750_.json
- reuters_20250329_163833_.json

2. Text Preprocessing¶

Now let's preprocess the text from the articles we collected. We retrieve the .json files and check for duplicates.

In [4]:
# Load articles from raw data directory with duplicate detection
import glob
import hashlib

def load_articles_from_directory(directory_path):
    """
    Load JSON articles from a directory with duplicate detection.
    
    Args:
        directory_path (str): Path to directory containing JSON files
        
    Returns:
        list: List of unique article dictionaries
    """
    # Get all JSON files in the directory
    json_files = glob.glob(f"{directory_path}/*.json")
    print(f"Found {len(json_files)} JSON files in {directory_path}")
    
    articles = []
    article_hashes = set()  # Store content hashes to detect duplicates
    
    for file in json_files:
        try:
            with open(file, 'r', encoding='utf-8') as f:
                article = json.load(f)
                
                # Skip if article doesn't have content
                if 'content' not in article or not article['content']:
                    print(f"Skipping {os.path.basename(file)}: No content")
                    continue
                
                # Create a hash of the article content to detect duplicates
                content_hash = hashlib.md5(article['content'].encode('utf-8')).hexdigest()
                
                if content_hash in article_hashes:
                    print(f"Skipping {os.path.basename(file)}: Duplicate content")
                    continue
                
                # Add hash to set and article to list
                article_hashes.add(content_hash)
                articles.append(article)
                print(f"Loaded {os.path.basename(file)}")
                
        except Exception as e:
            print(f"Error loading {os.path.basename(file)}: {e}")
    
    print(f"\nLoaded {len(articles)} unique articles successfully")
    return articles

# Path to raw data directory
raw_data_dir = './output/data/raw'

# Load unique articles
articles = load_articles_from_directory(raw_data_dir)

# Extract text content
article_texts = [article['content'] for article in articles]
article_titles = [article.get('title', 'Untitled') for article in articles]

# Show first article
if article_texts:
    print(f"Article Title: {article_titles[0] if article_titles[0] else 'No title'}")
    print(f"\nRaw Text:")
    print(article_texts[0][:500] + "..." if len(article_texts[0]) > 500 else article_texts[0])
else:
    print("No articles with content found")
Found 115 JSON files in ./output/data/raw
Loaded example_article.json
Loaded reuters_20250307_205801_.json
Loaded reuters_20250307_205849_.json
Loaded reuters_20250307_210517_.json
Loaded reuters_20250307_210553_.json
Loaded reuters_20250307_210638_.json
Loaded reuters_20250307_210718_.json
Loaded reuters_20250307_210755_.json
Loaded reuters_20250307_210831_.json
Skipping reuters_20250307_210912_.json: Duplicate content
Loaded reuters_20250307_211000_.json
Loaded reuters_20250307_211035_.json
Loaded reuters_20250307_211115_.json
Loaded reuters_20250307_211202_.json
Loaded reuters_20250307_211251_.json
Loaded reuters_20250307_211325_.json
Loaded reuters_20250307_211408_.json
Loaded reuters_20250309_141026_.json
Skipping reuters_20250309_141113_.json: Duplicate content
Loaded reuters_20250309_141159_.json
Loaded reuters_20250309_141236_.json
Loaded reuters_20250309_141319_.json
Skipping reuters_20250309_141359_.json: Duplicate content
Loaded reuters_20250309_141436_.json
Skipping reuters_20250309_141518_.json: Duplicate content
Loaded reuters_20250309_141602_.json
Loaded reuters_20250309_141648_.json
Loaded reuters_20250309_141727_.json
Loaded reuters_20250309_141815_.json
Loaded reuters_20250309_141855_.json
Loaded reuters_20250309_141925_.json
Loaded reuters_20250310_153147_.json
Loaded reuters_20250310_153226_.json
Loaded reuters_20250310_153308_.json
Loaded reuters_20250310_153345_.json
Loaded reuters_20250310_153429_.json
Loaded reuters_20250310_153511_.json
Loaded reuters_20250310_153553_.json
Loaded reuters_20250310_153634_.json
Loaded reuters_20250310_153719_.json
Loaded reuters_20250310_153752_.json
Loaded reuters_20250310_153834_.json
Loaded reuters_20250310_153914_.json
Loaded reuters_20250310_154002_.json
Loaded reuters_20250310_154050_.json
Loaded reuters_20250312_105206_.json
Loaded reuters_20250312_105249_.json
Loaded reuters_20250312_105325_.json
Loaded reuters_20250312_105403_.json
Loaded reuters_20250312_105442_.json
Loaded reuters_20250312_105520_.json
Loaded reuters_20250312_105558_.json
Loaded reuters_20250312_105635_.json
Loaded reuters_20250312_105723_.json
Loaded reuters_20250312_105756_.json
Loaded reuters_20250312_105830_.json
Loaded reuters_20250312_105910_.json
Loaded reuters_20250312_105955_.json
Loaded reuters_20250312_110042_.json
Loaded reuters_20250324_234548_.json
Loaded reuters_20250324_234635_.json
Loaded reuters_20250324_234719_.json
Loaded reuters_20250324_234804_.json
Loaded reuters_20250324_234844_.json
Loaded reuters_20250324_234922_.json
Loaded reuters_20250324_235000_.json
Loaded reuters_20250324_235041_.json
Loaded reuters_20250324_235124_.json
Loaded reuters_20250324_235201_.json
Loaded reuters_20250324_235236_.json
Loaded reuters_20250324_235310_.json
Loaded reuters_20250324_235347_.json
Loaded reuters_20250324_235419_.json
Loaded reuters_20250325_110930_.json
Loaded reuters_20250325_111011_.json
Loaded reuters_20250325_111052_.json
Loaded reuters_20250325_111127_.json
Loaded reuters_20250325_111201_.json
Loaded reuters_20250325_111241_.json
Loaded reuters_20250325_111320_.json
Loaded reuters_20250325_111404_.json
Loaded reuters_20250325_111449_.json
Loaded reuters_20250325_111539_.json
Loaded reuters_20250325_111616_.json
Loaded reuters_20250325_111700_.json
Loaded reuters_20250325_111738_.json
Loaded reuters_20250325_111816_.json
Loaded reuters_20250326_093253_.json
Loaded reuters_20250326_093333_.json
Loaded reuters_20250326_093407_.json
Loaded reuters_20250326_093444_.json
Loaded reuters_20250326_093528_.json
Loaded reuters_20250326_093614_.json
Loaded reuters_20250326_093704_.json
Loaded reuters_20250326_093751_.json
Loaded reuters_20250326_093837_.json
Loaded reuters_20250326_093921_.json
Loaded reuters_20250326_094008_.json
Loaded reuters_20250326_094052_.json
Loaded reuters_20250326_094131_.json
Loaded reuters_20250326_094209_.json
Loaded reuters_20250329_162915_.json
Loaded reuters_20250329_162956_.json
Loaded reuters_20250329_163042_.json
Loaded reuters_20250329_163131_.json
Loaded reuters_20250329_163219_.json
Loaded reuters_20250329_163304_.json
Loaded reuters_20250329_163339_.json
Loaded reuters_20250329_163417_.json
Loaded reuters_20250329_163457_.json
Loaded reuters_20250329_163543_.json
Loaded reuters_20250329_163625_.json
Loaded reuters_20250329_163707_.json
Loaded reuters_20250329_163750_.json
Loaded reuters_20250329_163833_.json

Loaded 111 unique articles successfully
Article Title: Apple announces new partnership with Microsoft

Raw Text:
Apple Inc. has announced a new partnership with Microsoft Corporation, according to CEO Tim Cook. The collaboration will focus on cloud computing services and AI integration. The partnership was revealed at a press conference in Cupertino, California yesterday. Microsoft CEO Satya Nadella expressed excitement about working with the iPhone maker. Apple was founded by Steve Jobs in 1976 and has become one of the world's most valuable companies.
In [5]:
# Preprocess text
cleaned_texts = []
for text in article_texts:
    # Clean the text but keep capitalization for NER
    cleaned = clean_text(
        text,
        lowercase=False,  # Keep case for NER
        remove_stops=False,  # Keep stop words for context
        lemmatize=False,  # Don't lemmatize to preserve entities
    )
    cleaned_texts.append(cleaned)

# Show first cleaned text
print(f"Cleaned Text:")
print(cleaned_texts[0][:500] + "..." if len(cleaned_texts[0]) > 500 else cleaned_texts[0])
2025-03-29 16:38:59,048 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,049 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,050 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,051 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,051 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,052 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,052 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,053 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,053 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,054 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,055 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,056 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,056 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,056 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,057 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,057 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,058 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,058 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,058 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,059 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,059 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,059 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,060 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,060 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,061 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,061 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,061 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,062 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,062 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,062 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,063 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,063 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,064 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,064 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,064 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,065 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,065 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,065 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,066 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,066 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,066 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,067 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,067 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,068 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,069 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,070 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,070 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,071 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,071 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,072 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,072 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,073 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,074 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,075 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,075 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,076 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,076 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,076 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,077 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,078 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,078 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,079 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,079 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,080 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,080 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,081 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,081 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,083 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,083 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,084 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,084 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,085 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,085 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,086 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,087 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,087 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,088 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,090 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,090 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,091 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,091 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,092 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,093 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,094 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,094 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,095 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,095 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,096 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,097 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,097 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,098 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,098 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,099 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,100 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,100 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,101 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,101 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,102 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,102 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,103 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,104 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,104 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,105 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,105 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,106 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,106 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,107 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,107 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,108 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,108 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,109 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,109 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,109 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,110 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,110 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,111 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,111 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,112 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,112 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,112 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,113 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,113 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,113 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,114 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,114 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,114 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,115 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,115 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,115 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,118 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,119 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,120 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,120 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,121 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,121 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,122 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,123 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,123 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,124 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,125 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,125 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,126 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,126 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,127 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,127 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,131 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,132 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,132 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,133 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,134 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,134 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,135 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,135 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,135 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,136 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,136 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,137 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,137 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,138 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,138 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,139 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,139 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,140 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,140 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,141 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,141 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,141 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,142 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,142 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,143 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,143 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,143 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,144 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,144 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,145 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,145 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,146 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,146 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,147 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,147 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,148 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,148 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,149 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,149 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,150 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,151 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,151 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,151 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,152 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,152 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,152 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,153 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,154 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,155 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,155 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,156 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,156 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,157 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,157 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,157 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,158 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,159 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,159 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,159 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,160 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,160 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,160 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,161 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,161 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,161 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,162 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,162 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,163 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,163 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,163 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,164 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,164 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,165 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,166 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,166 - src.preprocessing.cleaner - INFO - Text cleaning completed
2025-03-29 16:38:59,166 - src.preprocessing.cleaner - INFO - Cleaning text...
2025-03-29 16:38:59,167 - src.preprocessing.cleaner - INFO - Text cleaning completed
Cleaned Text:
Apple Inc has announced a new partnership with Microsoft Corporation according to CEO Tim Cook The collaboration will focus on cloud computing services and AI integration The partnership was revealed at a press conference in Cupertino California yesterday Microsoft CEO Satya Nadella expressed excitement about working with the iPhone maker Apple was founded by Steve Jobs in 1976 and has become one of the worlds most valuable companies

3. Named Entity Recognition (NER)¶

Let's extract named entities from the preprocessed text using spaCy.

In [6]:
# Initialize NER extractors
spacy_extractor = SpacyNERExtractor()

# Extract entities
all_entities = []
for i, text in enumerate(cleaned_texts):
    entities = spacy_extractor.extract_entities(text)
    all_entities.append(entities)
    
    print(f"\nEntities from article {i+1}: {article_titles[i]}")
    for entity, entity_type in entities:
        print(f"- {entity} ({entity_type})")
2025-03-29 16:39:03,673 - src.entity_recognition.ner - INFO - Loaded spaCy model: en_core_web_sm
Entities from article 1: Apple announces new partnership with Microsoft
- Apple Inc (ORG)
- Microsoft Corporation (ORG)
- Tim Cook (PERSON)
- AI (GPE)
- Cupertino (GPE)
- California (GPE)
- yesterday (DATE)
- Microsoft (ORG)
- Satya Nadella (PERSON)
- iPhone (ORG)
- Apple (ORG)
- Steve Jobs (PERSON)
- 1976 (DATE)

Entities from article 2: 
- March 7 (DATE)
- Reuters - Wall Streets (ORG)
- Friday (DATE)
- Fed (ORG)
- Jerome Powell (PERSON)
- Trump (ORG)
- Powell (PERSON)
- the Federal Reserve (ORG)
- Fed (ORG)
- Oliver Pursche (PERSON)
- Wealthspire Advisors At (ORG)
- 0100 pm ET (TIME)
- 7145 (CARDINAL)
- 017 to 4250763 (CARDINAL)
- 500 (CARDINAL)
- 2191 (CARDINAL)
- 038 (CARDINAL)
- 10492 (CARDINAL)
- 058 (CARDINAL)
- 18 (CARDINAL)
- Wells Fargo (ORG)
- Goldman Sachs (ORG)
- 19 (CARDINAL)
- Meta METAO (PERSON)
- Amazoncom (ORG)
- more than 3 (CARDINAL)
- Labor Department (ORG)
- February (DATE)
- the previous month (DATE)
- 41 (CARDINAL)
- first (ORDINAL)
- this year to June (DATE)
- May (DATE)
- LSEG Adding (PERSON)
- Morgan Stanley (ORG)
- Goldman Sachs (ORG)
- Equities (ORG)
- this year (DATE)
- mid-December (DATE)
- Donald Trumps (PERSON)
- Nasdaq (ORG)
- 10 (CARDINAL)
- December (DATE)
- 500 (CARDINAL)
- Trumps (PERSON)
- weekly (DATE)
- September Equity (ORG)
- weekly (DATE)
- four weeks (DATE)
- the week ended on March 5 (DATE)
- Thursday (DATE)
- four-week (DATE)
- Canada (GPE)
- Mexico (GPE)
- US (GPE)
- China (GPE)
- Hewlett Packard Enterprise (ORG)
- 158 (CARDINAL)
- annual (DATE)
- US (GPE)
- Costco (ORG)
- 7 (CARDINAL)
- Street (ORG)
- quarterly (DATE)
- 55 (CARDINAL)
- second-quarter (DATE)
- 118 (CARDINAL)
- NYSE (ORG)
- 136 (CARDINAL)
- 500 (CARDINAL)
- seven (CARDINAL)
- 52-week (DATE)
- 13 (CARDINAL)
- 18 (CARDINAL)
- 140 (CARDINAL)

Entities from article 3: 
- WASHINGTON (GPE)
- March 7 (DATE)
- Reuters (ORG)
- US (GPE)
- February (DATE)
- this year (DATE)
- The Labor Departments (ORG)
- Friday (DATE)
- first (ORDINAL)
- Donald Trumps (PERSON)
- 3-12-year (TIME)
- last month (DATE)
- the Great Recession Economists (FAC)
- Trump (ORG)
- January (DATE)
- Trumps (PERSON)
- November (DATE)
- Bernard Baumohl (PERSON)
- the Economic Outlook Group Nonfarm (ORG)
- 151000 (CARDINAL)
- last month (DATE)
- 125000 (CARDINAL)
- January (DATE)
- the Labor Departments Bureau of Labor Statistics (ORG)
- Reuters (ORG)
- 160000 (CARDINAL)
- 143000 (CARDINAL)
- January (DATE)
- 138000 per month (DATE)
- this year (DATE)
- 209000 (CARDINAL)
- the fourth quarter (DATE)
- the first quarter (DATE)
- Scott Anderson (PERSON)
- US (GPE)
- BMO Capital Markets Trump (ORG)
- this week (DATE)
- 25 (CARDINAL)
- Mexico (GPE)
- Canada (GPE)
- Chinese (NORP)
- 20 (CARDINAL)
- Thursday (DATE)
- Trump (PERSON)
- Canada (GPE)
- Mexico (GPE)
- North American (NORP)
- a month (DATE)
- 25 (CARDINAL)
- winter (DATE)
- the average workweek (DATE)
- five-year (DATE)
- 341 hours (TIME)
- 404000 (CARDINAL)
- the workweek (DATE)
- hours (TIME)
- Julia Pollak (ORG)
- ZipRecruiter Healthcare (ORG)
- 52000 (CARDINAL)
- 21000 (CARDINAL)
- Transportation (ORG)
- 18000 (CARDINAL)
- Employment (PRODUCT)
- 11000 (CARDINAL)
- 10000 (DATE)
- 19000 (CARDINAL)
- 6700 (CARDINAL)
- Elon Musks Department of Government Efficiency (ORG)
- DOGE (ORG)
- thousands (CARDINAL)
- one (CARDINAL)
- recent years (DATE)
- 11000 (CARDINAL)
- six-month (DATE)
- 35000 (CARDINAL)
- Professional (ORG)
- 2000 (CARDINAL)
- The Federal Reserve (ORG)
- overnight (TIME)
- 425 (CARDINAL)
- this month (DATE)
- US (GPE)
- June (DATE)
- Fed (ORG)
- January (DATE)
- 100 (CARDINAL)
- September (DATE)
- 525 (CARDINAL)
- 2022 (DATE)
- 2023 (DATE)
- Fed (ORG)
- Jerome Powell (PERSON)
- Friday (DATE)
- Powells (ORG)
- US Treasury (ORG)
- Retail (PERSON)
- 6000 (CARDINAL)
- one (CARDINAL)
- 27500 (CARDINAL)
- hourly (TIME)
- 03 (CARDINAL)
- 04 (CARDINAL)
- January Annual (DATE)
- 41 (CARDINAL)
- 39 (CARDINAL)
- January (DATE)
- January (DATE)
- The Atlanta Fed (ORG)
- 24 (CARDINAL)
- this quarter (DATE)
- 23 (CARDINAL)
- the fourth quarter (DATE)
- 41 (CARDINAL)
- 40 (CARDINAL)
- January (DATE)
- 588000 (CARDINAL)
- About 385000 (CARDINAL)
- last month (DATE)
- two-year (DATE)
- 624 (CARDINAL)
- 626 (CARDINAL)
- January (DATE)
- 599 (CARDINAL)
- 601 (CARDINAL)
- January (DATE)
- 460000 (CARDINAL)
- June 2023 (DATE)
- 49 million (CARDINAL)
- 80 (CARDINAL)
- October 2021 (DATE)
- 75 (CARDINAL)
- January (DATE)
- 8860 million (CARDINAL)
- 8764 million (CARDINAL)
- January (DATE)
- 54 (CARDINAL)
- April 2009 (DATE)
- March (DATE)
- Conrad DeQuadros (PERSON)
- Brean Capital (ORG)

Entities from article 4: 
- March 7 (DATE)
- Reuters - US Federal Reserve (ORG)
- Adriana Kugler (PERSON)
- Friday (DATE)
- 2 (CARDINAL)
- Kugler (PERSON)
- the Conference on Monetary Policy Transmission (ORG)
- Lisbon (GPE)
- the Banco de Portugal Going (ORG)
- Kugler (PERSON)
- Fed (ORG)
- March 18-19 (DATE)
- overnight (TIME)
- between 425 and 45 (CARDINAL)
- Fed (ORG)
- later in the year (DATE)
- Donald Trumps (PERSON)
- Kugler (PERSON)
- the past couple of months (DATE)
- Kugler (PERSON)
- US (GPE)

Entities from article 5: 
- NEW YORK (GPE)
- March 7 (DATE)
- Reuters - US (ORG)
- Friday (DATE)
- Federal Reserve (ORG)
- Jerome Powell (PERSON)
- weekly (DATE)
- Powell (PERSON)
- Donald Trumps (PERSON)
- this week (DATE)
- Trumps (PERSON)
- Canada Mexico (GPE)
- China (GPE)
- Powell (PERSON)
- Fed (ORG)
- Powell (PERSON)
- Jamie Cox (PERSON)
- Harris Financial Group (ORG)
- Richmond (GPE)
- Virginia (GPE)
- Powells (ORG)
- 500 11 (DATE)
- SPNY (ORG)
- SPLRCI (ORG)
- three (CARDINAL)
- the week (DATE)
- Nasdaq (ORG)
- nearly 4 (CARDINAL)
- weekly (DATE)
- September (DATE)
- Nasdaq (ORG)
- 10 (CARDINAL)
- December (DATE)
- 255 pm (TIME)
- 19708 (DATE)
- 046 to 4277616 (CARDINAL)
- 500 (CARDINAL)
- 2486 (DATE)
- 043 to 576338 (CARDINAL)
- 9941 (CARDINAL)
- 054 to 1816737 (CARDINAL)
- Friday (DATE)
- US (GPE)
- February (DATE)
- the previous month (DATE)
- thousands (CARDINAL)
- 41 (CARDINAL)
- Morgan Stanley (ORG)
- Goldman Sachs (ORG)
- Adam Hetts (PERSON)
- Janus Henderson Investors (ORG)
- Thursday (DATE)
- Trump (PERSON)
- four-week (DATE)
- Canada (GPE)
- Mexico (GPE)
- US (GPE)
- China Reciprocal (ORG)
- Hewlett Packard Enterprise (ORG)
- 13 (CARDINAL)
- annual (DATE)
- US (GPE)
- Costco (ORG)
- 7 (CARDINAL)
- Street (ORG)
- quarterly (DATE)
- 76 (DATE)
- second-quarter (DATE)
- Advancing (NORP)
- 132 (CARDINAL)
- NYSE (ORG)
- 75 (CARDINAL)
- 121 (CARDINAL)
- NYSE (ORG)
- 500 (CARDINAL)
- 8 (CARDINAL)
- 52-week (DATE)
- 13 (CARDINAL)
- 21 (CARDINAL)
- 148 (CARDINAL)

Entities from article 6: 
- March 7 (DATE)
- Reuters - Morgan Stanley (ORG)
- 2025 (CARDINAL)
- the United States (GPE)
- Friday (DATE)
- this year (DATE)
- 2026 (DATE)
- Morgan Stanley (ORG)
- Michael T Gapen (PERSON)
- 2025 (DATE)
- 15 (CARDINAL)
- 19 earlier (DATE)
- 2026 (DATE)
- 12 (CARDINAL)
- 13 (CARDINAL)
- Trumps (PERSON)
- US (GPE)
- Morgan Stanley (ORG)
- 25 (CARDINAL)
- the US Federal Reserve (ORG)
- this year (DATE)
- June (DATE)
- Gapen (PERSON)
- nearly three (CARDINAL)
- this year (DATE)
- Goldman Sachs (ORG)
- 2025 (DATE)
- 17 (CARDINAL)
- 22 (CARDINAL)
- 12-month (DATE)
- 20 (CARDINAL)
- 15 (CARDINAL)

Entities from article 7: 
- FRANKFURT (ORG)
- March 7 (DATE)
- Reuters - Porsche (ORG)
- Volkswagens (GPE)
- 2024 (CARDINAL)
- around 20 billion euros 217 billion (MONEY)
- Friday (DATE)
- Europes (GPE)
- first (ORDINAL)
- December (DATE)
- Volkswagens (GPE)
- Porsche (ORG)
- Porsche (ORG)
- 319 (CARDINAL)
- Volkswagen (ORG)
- 125 (CARDINAL)
- Porsche AG Porsche (ORG)
- Volkswagen (ORG)
- annual (DATE)
- Porsche (ORG)
- annual (DATE)
- March 26 (DATE)
- Volkswagen (ORG)
- Porsche AG (ORG)
- 199 billion (MONEY)
- 34 billion (CARDINAL)
- 52 billion (CARDINAL)
- end-2024 (DATE)
- Porsche (ORG)
- the past year (DATE)
- 1 (CARDINAL)
- 09231 (DATE)

Entities from article 8: 
- NEW YORK (GPE)
- March 7 (DATE)
- Reuters - Seesaw (ORG)
- Trump (PERSON)
- 500 (CARDINAL)
- 43 (CARDINAL)
- Donald Trump (PERSON)
- January 20 (DATE)
- one (CARDINAL)
- Art Hogan (PERSON)
- B Riley A (ORG)
- Thursday (DATE)
- Trump (ORG)
- Canada (GPE)
- Mexico (GPE)
- a month (DATE)
- 25 (CARDINAL)
- earlier this week (DATE)
- 26 (CARDINAL)
- the day (DATE)
- December 16 (DATE)
- Trump (ORG)
- Bill Sterling (PERSON)
- GWK Investment Management The CBOE Volatility (ORG)
- Thursday (DATE)
- December 18 at 2487 (DATE)
- first (ORDINAL)
- February 1 (DATE)
- Canada Mexico (GPE)
- China (GPE)
- Brian Jacobsen (PERSON)
- Annex Wealth Management (ORG)
- Trumps (PERSON)
- US (GPE)
- Dennis Dick (PERSON)
- Triple D Trading China (ORG)
- Wednesday (DATE)
- US (GPE)
- European (NORP)
- Germanys (GPE)
- Canadian (NORP)
- Justin Trudeau (PERSON)
- Thursday (DATE)
- US (GPE)
- US Treasury (ORG)
- Scott Bessent (PERSON)
- Trump (ORG)
- first (ORDINAL)
- Trump (ORG)
- Trump (ORG)
- Trump (ORG)
- Gene Goldman (PERSON)
- Cetera Investment Management (ORG)

Entities from article 9: 
- March 7 (DATE)
- Reuters (ORG)
- Apples AAPLO (ORG)
- Siri (PERSON)
- 2026 (DATE)
- Friday (DATE)
- Apple (ORG)
- the coming year (DATE)
- Apple (ORG)
- iPhone (ORG)
- 2025 (DATE)
- Last year (DATE)
- Apple (ORG)
- AI (GPE)
- Apple Intelligence (ORG)
- Siri (GPE)
- Apple (ORG)
- Apple (ORG)
- AI (ORG)
- Apple (ORG)
- 15 billion (CARDINAL)
- Apples (NORP)
- AI (GPE)
- Google (ORG)
- Gemini (PRODUCT)
- last year (DATE)
- Amazon AMZNO (ORG)
- last month (DATE)
- AI (GPE)
- Alexa (ORG)
- 1999 (DATE)

Entities from article 10: 
- WASHINGTON (GPE)
- March 7 (DATE)
- Reuters (ORG)
- The US Trade Representatives Office (ORG)
- Tuesday (DATE)
- Chinese (NORP)
- US (GPE)
- China (GPE)
- Joe Biden (PERSON)
- December (DATE)
- American (NORP)
- Chinas (ORG)
- 50 (CARDINAL)
- US (GPE)
- Chinese (NORP)
- Jan 1 (DATE)

Entities from article 11: 
- WASHINGTON (GPE)
- March 7 (DATE)
- Reuters - The Federal Aviation Administration (ORG)
- Friday (DATE)
- Feb 25 (PRODUCT)
- Chicago (GPE)
- FAA (ORG)
- Thursday (DATE)
- Last month (DATE)
- Southwest Airlines (ORG)
- Chicago Midway (ORG)

Entities from article 12: 
- WINNIPEG Manitoba March 6 (ORG)
- Reuters (ORG)
- only weeks (DATE)
- spring (DATE)
- Canadian (NORP)
- US (GPE)
- North American (NORP)
- US (GPE)
- Canada (GPE)
- months (DATE)
- US (GPE)
- Donald Trump (PERSON)
- 25 (CARDINAL)
- Canadian (NORP)
- Tuesday (DATE)
- one-month (DATE)
- Thursday (DATE)
- Canada (GPE)
- Thursday (DATE)
- second (ORDINAL)
- April 2 (DATE)
- US (GPE)
- Canadian (NORP)
- Saskatchewan Canadian (NORP)
- Florida (GPE)
- Saskatchewan (GPE)
- Scott Hepworth (PERSON)
- US (GPE)
- Hepworth (GPE)
- the Canadian Crops Convention (ORG)
- March 5 (DATE)
- 2024 (DATE)
- the US Department of Agriculture (ORG)
- 22 (CARDINAL)
- 303 (CARDINAL)
- January 3 to 348 (DATE)
- February 28 (DATE)
- Potash (GPE)
- Josh Linville (PERSON)
- Phosphate (ORG)
- Florida (GPE)
- Canada (GPE)
- Canadian (NORP)
- US (GPE)
- 90 (CARDINAL)
- 80 (CARDINAL)
- Canada (GPE)
- 25 (CARDINAL)
- more than 100 (CARDINAL)
- Canada (GPE)
- February 4 (DATE)
- the University of Illinois (ORG)
- Ohio State University Farmers (ORG)
- 100 per ton (QUANTITY)
- Canada (GPE)
- US (GPE)
- Russia (GPE)
- Belarus (GPE)
- Ukraine (GPE)
- Canada (GPE)
- US (GPE)
- Morocco (GPE)
- Canada (GPE)
- US (GPE)
- Canadian (NORP)
- Fertilizer (ORG)
- Canada (GPE)
- US (GPE)
- US (GPE)
- early spring (DATE)
- November (DATE)
- Mark Milam (PERSON)
- One (CARDINAL)
- US (GPE)
- as much as (PERCENT)
- US (GPE)
- Ken Seitz (PERSON)
- Nutrien NTRTO (ORG)
- the BMO Global Metals Mining and Critical Minerals Conference (ORG)
- February 25 (DATE)

Entities from article 13: 
- March 7 (DATE)
- Reuters - Federal Reserve (ORG)
- Adriana Kugler (PERSON)
- Friday (DATE)
- Trump (ORG)
- Kugler (PERSON)
- the Banco de Portugal (ORG)
- the Trump Administration (ORG)

Entities from article 14: 
- March 7 (DATE)
- Reuters - Blackstone BXN (ORG)
- 8 billion (MONEY)
- Friday (DATE)
- North America Europe (LOC)
- Australia (GPE)
- Blackstone (ORG)
- New York (GPE)
- Europe (LOC)
- London (GPE)
- multi-year (DATE)
- New York (GPE)
- Blackstones (NORP)
- less than 2 (CARDINAL)
- more than 60 (CARDINAL)
- 2007 (DATE)

Entities from article 15: 
- WASHINGTON (GPE)
- March 7 (DATE)
- Reuters (ORG)
- The US Federal Communications Commission (ORG)
- Friday (DATE)
- Starlink (ORG)
- Elon Musks SpaceX (ORG)
- ATT (ORG)
- Verizon VZN (ORG)
- FCC (ORG)
- Tesla SpaceX (PERSON)
- millions of dollars (MONEY)
- Donald Trumps (PERSON)
- Trumps (PERSON)
- Department of Government Efficiency (ORG)
- FCC (ORG)
- Brendan Carr (PERSON)
- FCC (ORG)
- Starlink (ORG)
- November (DATE)
- FCC (ORG)
- Starlink (ORG)
- T-Mobile (ORG)
- last month (DATE)
- T-Mobile (ORG)
- 500000 square miles 13 million square km (QUANTITY)
- US (GPE)
- T-Mobile Starlink (ORG)
- November (DATE)
- first (ORDINAL)
- FCC (ORG)
- T-Mobile (ORG)
- 2022 (DATE)
- January 2024 (DATE)
- first (ORDINAL)
- Earth (LOC)
- SpaceXs Falcon (ORG)
- March 2024 (DATE)
- FCC (ORG)
- 4 (CARDINAL)
- 5 (CARDINAL)

Entities from article 16: 
- March 7 (DATE)
- Reuters (ORG)
- US (GPE)
- Donald Trump (PERSON)
- the White House (FAC)
- Friday (DATE)
- Trumps (PERSON)
- Trump (ORG)
- Michael Saylor (PERSON)
- MicroStrategy (ORG)
- Zach Witkoff (PERSON)
- World Liberty Financial (ORG)
- Tenev (GPE)
- Robinhood Markets HOODO (ORG)
- Robinhood Witkoff (FAC)
- Saylor (PERSON)
- Attendees (ORG)
- Trumps (PERSON)
- four (CARDINAL)
- today (DATE)
- White House (ORG)
- David Sacks (PERSON)
- the White House (ORG)
- Friday (DATE)
- Sacks (GPE)
- a long time ago (DATE)
- Trump (PERSON)
- Thursday (DATE)
- Treasury (ORG)
- Commerce (ORG)
- 130 pm ET (TIME)
- 1830 (DATE)
- GMT (ORG)
- The White House (ORG)
- last Friday (DATE)
- February 28 (DATE)
- first (ORDINAL)
- Les Borsai (PERSON)
- Wave Digital (ORG)
- Sacks (ORG)
- Thursday (DATE)
- 14 (CARDINAL)
- 88194 (DATE)
- JP Richardson (ORG)
- Exodus EXODA (ORG)
- four (CARDINAL)
- Trump (ORG)
- Crypto (PERSON)
- Richardson (PERSON)
- Brian Armstrong (PERSON)
- Coinbase (ORG)
- Sunday (DATE)
- Richardson (PERSON)
- Armstrong (PERSON)
- Reuters (ORG)
- Paolo Ardoino (PERSON)
- the White House (FAC)
- Thursday (DATE)
- Reuters (ORG)
- Ardoino (NORP)
- Washington (GPE)
- Tether (PERSON)
- Tether (PERSON)
- US (GPE)
- 2021 (DATE)
- US (GPE)
- Tether (ORG)
- CFTCs (ORG)
- Ripple (GPE)
- Trumps (PERSON)
- XRP (ORG)
- Ripple (GPE)
- one (CARDINAL)
- four (CARDINAL)
- Trump (ORG)
- Attendees (ORG)
- Yesha Yadav (PERSON)
- Vanderbilt University (ORG)
- the Securities and Exchange Commission (ORG)
- Trumps (PERSON)
- World Liberty Financial (ORG)
- Trump (ORG)
- The White House (ORG)

Entities from article 17: 
- March 9 (DATE)
- Reuters (ORG)
- British (NORP)
- Richard Bransons Virgin Group (PERSON)
- 700 million pounds 900 million (MONEY)
- Eurostar (ORG)
- Sunday (DATE)
- Britain (GPE)
- London (GPE)
- Paris (GPE)
- Brussels (GPE)
- Amsterdam (GPE)
- Virgin Group (ORG)
- 300 million pounds (MONEY)
- 400 million pounds (MONEY)
- Reuters (ORG)
- Financial Times (ORG)
- first (ORDINAL)
- Virgins (ORG)
- first (ORDINAL)
- Eurostars 30-year-old (LAW)
- 2029 (DATE)
- Virgin (PERSON)
- Virgin (PERSON)
- Eurostar (ORG)
- Europe (LOC)
- Europe Eurostar (LOC)
- Reuters (ORG)
- 1 07740 pounds (MONEY)

Entities from article 18: 
- March 8 (DATE)
- Reuters - Barrick Gold (ORG)
- Mali (PERSON)
- Africa (LOC)
- the Middle East (LOC)
- Reuters (ORG)
- Saturday (DATE)
- Canadian (NORP)
- Malian (NORP)
- 2023 (DATE)
- West African (NORP)
- Loulo-Gounkoto (ORG)
- Barrick (PERSON)
- January 13 (DATE)
- Mali (GPE)
- around three metric tons (QUANTITY)
- early November (DATE)
- Reuters (ORG)
- February 19 (DATE)
- Barrick (PERSON)
- one (CARDINAL)
- Reuters Previous (ORG)
- Mali (PERSON)
- Saturday (DATE)
- Sebastiaan Bock (PERSON)
- Salaries (ORG)
- annual (DATE)
- one (CARDINAL)
- Reuters (ORG)
- early March (DATE)
- Barrick (PERSON)
- two months (DATE)

Entities from article 19: 
- March 8 (DATE)
- Reuters - Japan (ORG)
- US (GPE)
- Donald Trump (PERSON)
- Bank of Japan (ORG)
- Haruhiko Kuroda (PERSON)
- Trump (ORG)
- Monday (DATE)
- Japan (GPE)
- China (GPE)
- the United States (GPE)
- Trumps (PERSON)
- Friday (DATE)
- Kuroda (PERSON)
- Japanese (NORP)
- Japan (GPE)
- US (GPE)
- Trumps (PERSON)
- Japanese (NORP)
- Kuroda (PERSON)
- BOJ (ORG)
- 2022 (DATE)
- last year (DATE)
- July (DATE)
- 38-year (DATE)
- 162 (CARDINAL)
- this week (DATE)
- 148 yen (MONEY)
- BOJ (ORG)
- Kuroda (PERSON)
- first (ORDINAL)
- Kuroda (PERSON)
- BOJ (ORG)
- Kuroda (PERSON)
- 2013-2023 (DATE)
- Japan (GPE)
- decades (DATE)
- BOJ (ORG)
- 2013 (DATE)
- 2016 (DATE)
- Yen (PERSON)
- Washington (GPE)
- first (ORDINAL)
- Trump (ORG)
- Tokyo (GPE)
- Japanese (NORP)
- Kazuo Ueda (PERSON)
- BOJ (ORG)
- March last year (DATE)
- 05 (CARDINAL)
- January (DATE)
- Japan (GPE)
- 2 (CARDINAL)
- Kuroda (PERSON)
- BOJ (ORG)
- BOJ (ORG)
- Kuroda (ORG)
- January (DATE)
- Kuroda (PERSON)
- BOJ (ORG)
- the coming years (DATE)
- 2 (CARDINAL)

Entities from article 20: 
- March 7 (DATE)
- Reuters (ORG)
- Friday (DATE)
- Boeing (ORG)
- January 2024 (DATE)
- Alaskan Airlines (ORG)
- 737 (CARDINAL)
- MAX (ORG)
- 9 (CARDINAL)
- US (GPE)
- Leonie Brinkema (PERSON)
- Alexandria (GPE)
- Virginia (GPE)
- Rhode Islands (GPE)
- Boeing (ORG)
- between January 7 2021 (DATE)
- January 8 2024 (DATE)
- 2019 (DATE)
- Brinkema (ORG)
- Boeing (ORG)
- US Department of Justice (ORG)
- MAX (ORG)
- Boeing (ORG)
- two (CARDINAL)
- MAX (ORG)
- October 2018 (DATE)
- March 2019 (DATE)
- 346 (CARDINAL)

Entities from article 21: 
- March 7 (DATE)
- Reuters - The Federal Reserve (ORG)
- March 18-19 (DATE)
- US (GPE)
- Trump (ORG)
- US (GPE)
- February (DATE)
- the Labor Department (ORG)
- Friday (DATE)
- 151000 (CARDINAL)
- monthly (DATE)
- 80000 to 100000 (CARDINAL)
- Fed (ORG)
- Christopher Waller (PERSON)
- Thursday (DATE)
- Waller (ORG)
- Fed (ORG)
- overnight (TIME)
- 425 (CARDINAL)
- 2 (CARDINAL)
- 41 (CARDINAL)
- U-6 (ORG)
- 8 (CARDINAL)
- October 2021 (DATE)
- last month (DATE)
- Elon Musk (PERSON)
- Department of Government Efficiency (ORG)
- March (DATE)
- April The February (DATE)
- Julia Coronado (ORG)
- MacroPolicy Perspectives (ORG)
- DOGE (ORG)
- the months ahead (DATE)
- Fed (ORG)
- Fed (ORG)
- June (DATE)
- May (DATE)
- three (CARDINAL)
- 2025 (DATE)
- Fed (ORG)
- December (DATE)
- two (CARDINAL)
- this year (DATE)
- US (GPE)
- Donald Trumps (PERSON)
- Fed (ORG)
- Fed (ORG)
- Jerome Powell (PERSON)
- Friday (DATE)

Entities from article 22: 
- NEW YORK (GPE)
- March 7 (DATE)
- Reuters - US (ORG)
- Friday (DATE)
- Federal Reserve (ORG)
- Jerome Powell (PERSON)
- US (GPE)
- weekly (DATE)
- months (DATE)
- Powell (PERSON)
- Donald Trumps (PERSON)
- this week (DATE)
- Trumps (PERSON)
- Canada Mexico (GPE)
- China (GPE)
- weekly (DATE)
- September (DATE)
- Nasdaq (ORG)
- third straight week (DATE)
- mid-July (DATE)
- early August last year (DATE)
- Powell (PERSON)
- Fed (ORG)
- Powell (PERSON)
- Jamie Cox (PERSON)
- Harris Financial Group (ORG)
- Richmond (GPE)
- Virginia (GPE)
- Powells (ORG)
- three (CARDINAL)
- the week (DATE)
- Nasdaq (ORG)
- 10 (CARDINAL)
- December (DATE)
- Utilities SPLRCU (ORG)
- SPNY (ORG)
- SPLRCI (ORG)
- 22264 (CARDINAL)
- 052 to 4280172 (CARDINAL)
- 500 (CARDINAL)
- 3168 (CARDINAL)
- 055 (CARDINAL)
- 577020 (DATE)
- 12697 (CARDINAL)
- 070 to 1819622 (CARDINAL)
- the week (DATE)
- 31 (CARDINAL)
- Nasdaq (ORG)
- 345 (CARDINAL)
- Dow (ORG)
- 237 (CARDINAL)
- 386 Data (PRODUCT)
- Friday (DATE)
- US (GPE)
- February (DATE)
- the previous month (DATE)
- thousands (CARDINAL)
- 41 (CARDINAL)
- Morgan Stanley (ORG)
- Goldman Sachs (ORG)
- Adam Hetts (PERSON)
- Janus Henderson Investors (ORG)
- Thursday (DATE)
- Trump (PERSON)
- four-week (DATE)
- Canada (GPE)
- Mexico (GPE)
- US (GPE)
- China Reciprocal (ORG)
- Hewlett Packard Enterprise (ORG)
- 12 (CARDINAL)
- annual (DATE)
- US (GPE)
- Costco (ORG)
- 6 (CARDINAL)
- quarterly (DATE)
- 86 (DATE)
- second-quarter (DATE)
- Advancing (NORP)
- 135 (CARDINAL)
- NYSE (ORG)
- 92 (CARDINAL)
- 136 (CARDINAL)
- NYSE (ORG)
- 500 (CARDINAL)
- 8 (CARDINAL)
- 52-week (DATE)
- 13 (CARDINAL)
- 28 (CARDINAL)
- 159 (CARDINAL)
- About 1692 billion (MONEY)
- US (GPE)
- 20-day (DATE)
- 1623 billion (CARDINAL)

Entities from article 23: 
- March 7 (DATE)
- Reuters - Federal Reserve (ORG)
- Jerome Powell (PERSON)
- Friday (DATE)
- Feds (NORP)
- US (GPE)
- the end of summer (DATE)
- SEP (ORG)
- Powell (PERSON)
- New York (GPE)
- Feds (NORP)
- Feds (NORP)
- quarterly (DATE)
- 19 (CARDINAL)
- Feds (NORP)
- the next several years (DATE)
- 4 (CARDINAL)
- Feds (NORP)
- each March June September (DATE)
- December (DATE)
- Fed (ORG)
- Supporters (PERSON)
- Feds (NORP)
- US (GPE)
- zero (CARDINAL)
- Fed (ORG)
- Powell (PERSON)
- Fed (ORG)
- Fed (ORG)
- the end of 2021 (DATE)
- end-2022 (DATE)
- less than 1 (CARDINAL)
- 425 (CARDINAL)
- Fed (ORG)
- Fed (ORG)
- Powell (PERSON)
- 2018 (DATE)
- Friday (DATE)
- Fed (ORG)
- Don Kohn (PERSON)
- 19 (CARDINAL)
- Fed (ORG)
- Feds (NORP)
- The European Central Bank (ORG)
- The Reserve Bank of New Zealand (ORG)
- The Bank of England (ORG)
- Fed (ORG)
- Ben Bernanke (PERSON)

Entities from article 24: 
- March 8 (DATE)
- Reuters - US (ORG)
- Health and Human Services (ORG)
- Robert F Kennedy Jr (PERSON)
- General Mills GISN (ORG)
- March 10 (DATE)
- Politico (ORG)
- Saturday (DATE)
- first (ORDINAL)
- February (DATE)
- the Consumer Brands Association (ORG)
- Kennedy Jr (PERSON)
- White Houses (ORG)
- Cabinet (ORG)
- The US Department of Health and Human Services General Mills (ORG)
- PepsiCo (ORG)
- Reuters (ORG)
- Kennedy (PERSON)
- hundreds (CARDINAL)
- US (GPE)
- Kennedy (PERSON)
- Make America Healthy Again (WORK_OF_ART)
- the United States (GPE)
- Early last month (DATE)
- Kennedy (PERSON)
- the US Department of Health and Human Services (ORG)
- Senate (ORG)

Entities from article 25: 
- NEW YORK (GPE)
- March 7 (DATE)
- Reuters (ORG)
- US (GPE)
- US (GPE)
- February (DATE)
- 41 (CARDINAL)
- this week (DATE)
- the months (DATE)
- Jack Ablin (PERSON)
- Cresset Capital (ORG)
- Chicago (GPE)
- Ablin (PERSON)
- Nonfarm (ORG)
- 151000 (CARDINAL)
- last month (DATE)
- 125000 (CARDINAL)
- January (DATE)
- the Labor Department (ORG)
- Friday (DATE)
- Reuters (ORG)
- 160000 (CARDINAL)
- US (GPE)
- the fourth quarter (DATE)
- US (GPE)
- nearly two years (DATE)
- January (DATE)
- Gennadiy Goldberg (PERSON)
- US (GPE)
- TD Securities (ORG)
- New York (GPE)
- Donald Trumps (PERSON)
- Mexico Canada (ORG)
- China Risks (ORG)
- Mexican (NORP)
- Canadian (NORP)
- American (NORP)
- US (GPE)
- Reuters (ORG)
- this week (DATE)
- Chris Grisanti (PERSON)
- MAI Capital Management (ORG)
- monthly (DATE)
- Fridays (DATE)
- this week (DATE)
- Thursday (DATE)
- December (DATE)
- weekly (DATE)
- six months (DATE)
- Friday (DATE)
- 500 (CARDINAL)
- afternoon (TIME)
- mid-day (DATE)
- Federal Reserve (ORG)
- Jerome Powell (PERSON)
- US (GPE)
- Trump (ORG)
- Powell (PERSON)
- Powell (PERSON)
- Lindsey Bell (PERSON)
- Clearnomics For (ORG)
- Fridays (DATE)
- Carson Group (ORG)
- Sonu Varghese (PERSON)
- US (GPE)
- US (GPE)
- US (GPE)
- Talley Leger (PERSON)
- The Wealth Consulting Group (ORG)
- 3-week (DATE)
- 2656 (DATE)
- Friday (DATE)
- Torsten Slok (PERSON)
- Apollo Global Management (ORG)
- Fed (ORG)
- Fed (ORG)
- 425 (CARDINAL)
- last month (DATE)
- March 18-19 (DATE)
- Fed (ORG)
- about three (CARDINAL)
- 2025 (DATE)
- Fed (ORG)
- the coming months (DATE)
- Seema Shah (PERSON)
- Principal Asset Management (ORG)

Entities from article 26: 
- March 8 (DATE)
- Reuters - China (ORG)
- over 26 billion (MONEY)
- Canadian (NORP)
- Saturday (DATE)
- Ottawa (GPE)
- October (DATE)
- US (GPE)
- Donald Trumps (PERSON)
- the commerce ministry (ORG)
- March 20 (DATE)
- 100 and 25 (CARDINAL)
- Canada (GPE)
- China (GPE)
- just over four months ago (DATE)
- Canadas (ORG)
- China (GPE)
- last year (DATE)
- Beijing (GPE)
- Trump (PERSON)
- 25 (CARDINAL)
- the White House (ORG)
- Canada (GPE)
- Mexico (GPE)
- 20 (CARDINAL)
- Chinese (NORP)
- Canadas (ORG)
- World Trade Organization (ORG)
- Chinas (ORG)
- the commerce ministry (ORG)
- China (GPE)
- 100 (CARDINAL)
- just over 1 billion (MONEY)
- Canadian (NORP)
- 25 (CARDINAL)
- 16 billion (CARDINAL)
- Canadian (NORP)
- Dan Wang (PERSON)
- China (GPE)
- Eurasia Group (ORG)
- Singapore (GPE)
- China (GPE)
- Canada (GPE)
- American (NORP)
- Chinas (PERSON)
- Ottawas (GPE)
- October (DATE)
- US (GPE)
- European Union Canada (ORG)
- Canadian (NORP)
- Beijing (GPE)
- Reuters (ORG)
- Canadian (NORP)
- Justin Trudeau (PERSON)
- August (DATE)
- Ottawa (GPE)
- Chinas intentional state (ORG)
- the United States (GPE)
- European Union (ORG)
- Chinese (NORP)
- China (GPE)
- September (DATE)
- Canadian (NORP)
- More than half (CARDINAL)
- Canadas (ORG)
- China (GPE)
- 37 billion (CARDINAL)
- 2023 (DATE)
- the Canola Council of Canada (ORG)
- Canadian (NORP)
- Rosa Wang (PERSON)
- Beijing (GPE)
- Ottawa (GPE)
- Canadas (ORG)
- October 20 (DATE)
- China (GPE)
- Canadas (ORG)
- second (ORDINAL)
- the United States (GPE)
- Canada (GPE)
- 47 billion (CARDINAL)
- second (ORDINAL)
- 2024 (DATE)
- Chinese (NORP)
- China (GPE)
- Canadas (ORG)
- third (ORDINAL)
- Canada (GPE)
- Cam Dahl (ORG)
- the Manitoba Pork Council (ORG)
- China (GPE)
- China (GPE)
- Mexico (GPE)
- China (GPE)
- Canadas (ORG)
- two (CARDINAL)
- Chris Davison (PERSON)
- the Canola Council of Canada (ORG)
- Canadian (NORP)
- China (GPE)
- Beijing (GPE)
- Australia (GPE)
- China (GPE)
- 2020 (DATE)
- Australian (NORP)
- Canberra (GPE)
- COVID (ORG)
- Beijing (GPE)
- 2023 one year (DATE)
- Australian (NORP)
- Anthony Albanese (PERSON)
- Scott Morrison (PERSON)

Entities from article 27: 
- March 7 (DATE)
- Reuters (ORG)
- The US Department of Justice (ORG)
- Friday (DATE)
- Alphabets Google GOOGLO (ORG)
- OpenAI (ORG)
- DOJ (ORG)
- 38 (CARDINAL)
- Google (ORG)
- Chrome (PRODUCT)
- Googles (PERSON)
- Washington (GPE)
- American (NORP)
- Google (ORG)
- Courts (ORG)
- Americas (LOC)
- Donald Trump (PERSON)
- Big Tech (ORG)
- first (ORDINAL)
- Joe Bidens (PERSON)
- Trump (ORG)
- Gail Slater (PERSON)
- Google (ORG)
- billions of dollars (MONEY)
- OpenAI (GPE)
- Microsoft (ORG)
- Anthropic (NORP)
- February (DATE)
- November (DATE)
- Google (ORG)
- AI (GPE)
- Friday (DATE)
- Google (ORG)
- AI Google (ORG)
- Apple AAPLO (ORG)
- Google (ORG)
- US (GPE)
- Amit Mehta (PERSON)
- April (DATE)
- one (CARDINAL)
- US (GPE)
- Big Tech (ORG)
- Apple Meta Platforms (ORG)
- Amazoncom (ORG)
- Trumps (PERSON)
- Google (ORG)
- AI (GPE)
- Americas (LOC)
- November (DATE)
- Google (ORG)
- Google (ORG)
- Democratic (NORP)
- Republican (NORP)
- the Alphabet Workers Union-CWA (ORG)

Entities from article 28: 
- HOUSTON (GPE)
- March 10 (DATE)
- Reuters - US Energy (ORG)
- Chris Wright (PERSON)
- Monday (DATE)
- Joe Bidens (PERSON)
- Wright (PERSON)
- CERAWeek (PRODUCT)
- Houston (GPE)
- Trump (ORG)
- Joe Biden (PERSON)
- zero (CARDINAL)

Entities from article 29: 
- March 10 (DATE)
- Reuters - ArcelorMittal (ORG)
- Monday (DATE)
- more than 270 million (MONEY)
- 2754 million (CARDINAL)
- Dunkirk (ORG)
- Fos (ORG)
- several months (DATE)
- April 15 (DATE)
- second (ORDINAL)
- 90 days (DATE)
- Dunkirk (ORG)
- Europe (LOC)
- Dunkirk (ORG)
- 254 million (CARDINAL)
- 183 million (CARDINAL)
- Fos-sur-mer (LOC)
- 1 09222 (CARDINAL)

Entities from article 30: 
- March 10 (DATE)
- Reuters - Wall Streets (ORG)
- Monday (DATE)
- US (GPE)
- Donald Trumps (PERSON)
- the weekend (DATE)
- Nasdaq (ORG)
- five-month (DATE)
- 0950 (DATE)
- 30775 (CARDINAL)
- 072 to 4249247 (CARDINAL)
- 500 (CARDINAL)
- 7489 (CARDINAL)
- 130 to 569531 (CARDINAL)
- 37178 (CARDINAL)
- 203 (CARDINAL)
- Nvidia (GPE)
- 22 (CARDINAL)
- Meta METAO (PERSON)
- Amazoncom (ORG)
- more than 3 (CARDINAL)
- Tesla TSLAO (ORG)
- 7 (CARDINAL)
- November 5 (DATE)
- UBS (ORG)
- first-quarter (DATE)
- 26 (CARDINAL)
- 2000 (DATE)
- 1 (CARDINAL)
- Goldman Sachs (ORG)
- more than 3 (CARDINAL)
- Sunday (DATE)
- Trump (PERSON)
- US (GPE)
- Mexico Canada (ORG)
- China (GPE)
- Chinas (ORG)
- US (GPE)
- Monday (DATE)
- US (GPE)
- later in the week (DATE)
- Art Hogan (PERSON)
- B Riley Wealth (ORG)
- Reuters (ORG)
- 91 (CARDINAL)
- Trumps (PERSON)
- HSBC (ORG)
- US (GPE)
- weekly (DATE)
- September (DATE)
- Friday (DATE)
- Monday (DATE)
- more than 10 (CARDINAL)
- December (DATE)
- last week (DATE)
- CBOE Volatility (ORG)
- December (DATE)
- later in the week (DATE)
- Friday (DATE)
- Fed (ORG)
- Jerome Powells (PERSON)
- The Federal Open Market Committee (ORG)
- next week (DATE)
- the first half of this year (DATE)
- LSEG US (PERSON)
- Chinese (NORP)
- Alibaba (GPE)
- 31 (CARDINAL)
- Bilibili (ORG)
- 5 (CARDINAL)
- China (GPE)
- second (ORDINAL)
- Crypto (PRODUCT)
- MicroStrategy (ORG)
- 10 (CARDINAL)
- Coinbase COINO (ORG)
- 9 (CARDINAL)
- Riot RIOTO (PERSON)
- 52 (CARDINAL)
- 283 (CARDINAL)
- NYSE (ORG)
- Nasdaq (ORG)
- 298 (CARDINAL)
- 500 (CARDINAL)
- three (CARDINAL)
- 52-week (DATE)
- one (CARDINAL)
- 10 (CARDINAL)
- 40 (CARDINAL)

Entities from article 31: 
- March 10 (DATE)
- Reuters - Assura (ORG)
- Monday (DATE)
- 161 billion pound (MONEY)
- 21 billion (CARDINAL)
- US (GPE)
- KKR (ORG)
- Stonepeak Partners (ORG)
- British (NORP)
- 14 (CARDINAL)
- Assuras (ORG)
- about 465 pence (MONEY)
- just over half (CARDINAL)
- 88 pence (MONEY)
- 2020 (DATE)
- 494 pence (MONEY)
- Assura (ORG)
- four (CARDINAL)
- KKR (ORG)
- 48 pence (MONEY)
- February Mondays (DATE)
- KKR (ORG)
- New York (GPE)
- Stonepeak (PERSON)
- 319 (CARDINAL)
- February 13 a day (DATE)
- KKR (ORG)
- 213 (CARDINAL)
- Fridays (DATE)
- September (DATE)
- more than 600 (CARDINAL)
- about 32 billion pounds (MONEY)
- Britains National Health Service (ORG)
- Assura (ORG)
- Monday (DATE)
- Primary Health Properties (ORG)
- Assura (ORG)
- 43 pence (MONEY)
- KKR (ORG)
- Stonepeak (PERSON)
- PHP (ORG)
- April 7 (DATE)
- British (NORP)
- PHP (ORG)
- Assura (ORG)
- 1 07747 pounds (QUANTITY)

Entities from article 32: 
- March 10 (DATE)
- Reuters - Alberta (ORG)
- Danielle Smith (PERSON)
- Monday (DATE)
- Canadian (NORP)
- the United States (GPE)
- US (GPE)
- Donald Trumps (PERSON)
- US (GPE)
- Canadian (NORP)
- zero (CARDINAL)
- CERAWeek (LOC)
- Houston (GPE)
- Canada (GPE)
- US (GPE)
- 2 million barrels (QUANTITY)
- Albertas (ORG)
- Alberta (GPE)
- the United States (GPE)
- Americas (LOC)
- Trump (PERSON)
- the United States (GPE)
- Alberta (GPE)
- Smith (ORG)
- Canadian (NORP)
- Spain (GPE)
- India (GPE)
- Canadian (NORP)

Entities from article 33: 
- March 10 (DATE)
- Reuters - HSBC (ORG)
- Monday (DATE)
- US (GPE)
- European (NORP)
- Germany (GPE)
- US (GPE)
- European (NORP)
- UK (GPE)
- Trump (ORG)
- 12 trillion (MONEY)
- European (NORP)
- China (GPE)
- the United States (GPE)
- 500 (CARDINAL)
- about 61 (CARDINAL)
- February 19 (DATE)
- US (GPE)
- Global Equity Strategist (ORG)
- Alastair Pinder (PERSON)
- Morgan Stanley Equity (ORG)
- Michael Wilson (PERSON)
- mid-year before ending the year (DATE)
- 127 upside (MONEY)
- Morgan Stanleys Wilson (ORG)
- Monday (DATE)

Entities from article 34: 
- March 10 (DATE)
- Reuters (ORG)
- 2025 (DATE)
- Volkswagens (GPE)
- Traton 8TRADE (ORG)
- Monday (DATE)
- Scania (ORG)
- 5 (CARDINAL)
- 1055 (DATE)
- GMT (ORG)
- Daimler Truck (ORG)
- Volvo (ORG)
- 2025 (DATE)
- -5 to 5 (CARDINAL)
- between 75 and 85 (DATE)
- the second half of 2025 (DATE)
- US (GPE)
- Mexico (GPE)
- 65 (CARDINAL)
- the United States (GPE)
- last year (DATE)
- Mexican (NORP)
- European (NORP)
- this year (DATE)
- US (GPE)
- the Unites States (GPE)
- European (NORP)
- last year (DATE)
- 2023 (DATE)
- Swedens Volvo (ORG)
- January (DATE)
- the fourth quarter (DATE)
- European (NORP)
- fourth-quarter (DATE)
- German (NORP)
- Europe (LOC)
- 92 (CARDINAL)
- 2024 (DATE)
- 2023 (DATE)
- MAN (ORG)
- 13 (CARDINAL)
- 170 (CARDINAL)
- 850 million (CARDINAL)
- Volkswagen (ORG)
- almost 90 (CARDINAL)
- almost a fifth (CARDINAL)
- 06 (CARDINAL)
- Traton (GPE)
- Tratons (PRODUCT)
- Volkswagen (ORG)
- thousands (CARDINAL)
- Germany (GPE)
- European (NORP)
- China (GPE)
- EV (ORG)

Entities from article 35: 
- March 10 - Morning Bid US (DATE)
- US (GPE)
- Mike Dolan (PERSON)
- Financial Markets (ORG)
- Donald Trump (PERSON)
- first (ORDINAL)
- Commerce (ORG)
- Howard Lutnick (PERSON)
- Sunday (DATE)
- Trump (PERSON)
- Fox News (ORG)
- 500 (CARDINAL)
- 31 last week (DATE)
- Nasdaq (ORG)
- 345 (CARDINAL)
- Dow Jones (ORG)
- 24 (CARDINAL)
- 39 (CARDINAL)
- Meet the Press (WORK_OF_ART)
- Trumps (PERSON)
- US (GPE)
- February (DATE)
- Friday (DATE)
- Federal Reserve (ORG)
- Jerome Powell (PERSON)
- Powell (PERSON)
- Fed (ORG)
- Monday (DATE)
- Treasury (ORG)
- last weeks (DATE)
- Chinese (NORP)
- weekend (DATE)
- European (NORP)
- Canadas (ORG)
- Bank of Canada (ORG)
- Bank of England (ORG)
- Mark Carney (PERSON)
- Today Ill (PERSON)
- years (DATE)
- Central (ORG)
- less than two months (DATE)
- US (GPE)
- Donald Trumps (PERSON)
- decades-old (DATE)
- US (GPE)
- German (NORP)
- European (NORP)
- China (GPE)
- 5 (CARDINAL)
- US (GPE)
- US (GPE)
- Wall Street Caught (ORG)
- the Federal Reserve (ORG)
- next week (DATE)
- some 12 months (DATE)
- Fed (ORG)
- Fed (ORG)
- Jerome Powell (PERSON)
- Friday (DATE)
- this weeks (DATE)
- last months (DATE)
- the end of the first quarter (DATE)
- the years (DATE)
- Fed (ORG)
- 2025 (DATE)
- AXA Investment (ORG)
- Chris Iggo (PERSON)
- US (GPE)
- Europe (LOC)
- The European Central Bank (ORG)
- last week (DATE)
- Germany (GPE)
- European (NORP)
- June (DATE)
- ECB (ORG)
- Trumps (PERSON)
- US (GPE)
- Europe (LOC)
- Berlins (PERSON)
- last weeks (DATE)
- weekly (DATE)
- 16 years (DATE)
- Transatlantic (ORG)
- US (GPE)
- European (NORP)
- the year (DATE)
- Fiscal (ORG)
- Stephen Jen (PERSON)
- last week (DATE)
- Bond (ORG)
- decades (DATE)
- Chart (ORG)
- the day (DATE)
- US (GPE)
- February (DATE)
- 460000 (CARDINAL)
- monthly (DATE)
- June 2023 (DATE)
- 49 million (CARDINAL)
- May 2021 (DATE)
- 80 (CARDINAL)
- October 2021 (DATE)
- 8860 million (CARDINAL)
- 8764 million (CARDINAL)
- January (DATE)
- 54 (CARDINAL)
- April 2009 (DATE)
- US (GPE)
- February (DATE)
- New York Federal Reserve (ORG)
- February (DATE)
- Euro (PERSON)
- Brussels (GPE)
- European Central Bank (ORG)
- Christine Lagarde (PERSON)
- Piero Cipollone (PERSON)
- Ukrainian (NORP)
- Volodymyr Zelenskyy (PERSON)
- Saudi Arabia (GPE)
- Saudi (NORP)
- Mohammed Bin Salman (PERSON)
- US (GPE)
- Oracle Opinions (ORG)
- Reuters News (ORG)

Entities from article 36: 
- LONDON (GPE)
- March 10 (DATE)
- Reuters - Wall Street (ORG)
- Monday (DATE)
- China (GPE)
- US (GPE)
- 112 (CARDINAL)
- 137 (CARDINAL)
- 165 (CARDINAL)
- 03 (CARDINAL)
- almost two months (DATE)
- European (NORP)
- Europes (PERSON)
- as much as (CARDINAL)
- 4-month (DATE)
- 26 (CARDINAL)
- This week (DATE)
- a big week (DATE)
- the next quarters (DATE)
- the years (DATE)
- James Rossiter (PERSON)
- TD Securities (ORG)
- Rossiter added European Union (ORG)
- this week (DATE)
- EU (ORG)
- the European Investment Bank (ORG)
- German (NORP)
- last week (DATE)
- 500 billion (MONEY)
- 541 billion (CARDINAL)
- 10 years (DATE)
- Germanys (GPE)
- later this week (DATE)
- European Union (ORG)
- the United States (GPE)
- Federal Reserve (ORG)
- Chair Jerome Powell (PERSON)
- US (GPE)
- Chinas (ORG)
- 13 months (DATE)
- February (DATE)
- second (ORDINAL)
- Chinas (ORG)
- 04 (CARDINAL)
- the Shanghai Composite Index (ORG)
- 02 (CARDINAL)
- Hong Kongs (GPE)
- Hang Seng (PERSON)
- 19 (CARDINAL)
- 07 to 146975 (DATE)
- Beijing (GPE)
- the start of the week-long (DATE)
- National Peoples Congress (ORG)
- Tuesday (DATE)
- US (GPE)
- Donald Trump (PERSON)
- Fox News (ORG)
- Sunday (DATE)
- China (GPE)
- Canada (GPE)
- Mexico (GPE)
- US (GPE)
- US (GPE)
- Friday (DATE)
- monthly (DATE)
- February (DATE)
- first (ORDINAL)
- Trumps (PERSON)
- Trumps (PERSON)
- Kyle Rodda (PERSON)
- Capitalcom (ORG)
- first (ORDINAL)
- 10-year (DATE)
- US Treasury (ORG)
- 7 (CARDINAL)
- 42474 (DATE)
- two-year (DATE)
- 6 (CARDINAL)
- 3945 (DATE)
- US (GPE)
- six (CARDINAL)
- 10379 (DATE)
- 10841 (DATE)
- 12926 (DATE)
- Canada (GPE)
- Friday (DATE)
- Canadian (NORP)
- Tuesday (DATE)
- Wednesday (DATE)
- Thursday (DATE)
- Friday (DATE)
- Michael Brown (PERSON)
- Pepperstone (ORG)
- US (GPE)
- Russian (NORP)
- Russian (NORP)
- Ukraine (GPE)
- 44 cents (MONEY)
- 7070 (DATE)
- US (GPE)
- 51 cents (MONEY)
- as much as (PERCENT)
- Friday (DATE)
- this month (DATE)
- 8008542 (DATE)
- 82982 (DATE)
- Friday (DATE)

Entities from article 37: 
- CHICAGOWASHINGTON (ORG)
- March 10 (DATE)
- Belgrade (GPE)
- Montana (PERSON)
- 648000 (CARDINAL)
- USDAs Agricultural Marketing Service (ORG)
- about 150 (CARDINAL)
- the Trump Administrations (WORK_OF_ART)
- about 500 tons (QUANTITY)
- Colorado (GPE)
- Last week (DATE)
- Washington DC (GPE)
- USDA (ORG)
- Farmers (ORG)
- USDA (ORG)
- more than two dozen (CARDINAL)
- seven (CARDINAL)
- Reuters (ORG)
- Trump (ORG)
- Canada Mexico (GPE)
- China (GPE)
- US (GPE)
- Trump (ORG)
- March 6 (DATE)
- April 2 (DATE)
- 191 billion (CARDINAL)
- American (NORP)
- US (GPE)
- Powell-Palm (ORG)
- March 3 (DATE)
- Trump (PERSON)
- Truth Social (ORG)
- Trump (ORG)
- Trump (PERSON)
- US (GPE)
- November (DATE)
- Two (CARDINAL)
- Reuters (ORG)
- weeks (DATE)
- USDA (ORG)
- White House (ORG)
- Anna Kelly (PERSON)
- USDA (ORG)
- US (GPE)
- Trump (ORG)
- American (NORP)
- last fall (DATE)
- Trump (PERSON)
- the White House (ORG)
- first (ORDINAL)
- Trump (ORG)
- about 217 billion (MONEY)
- four-year (DATE)
- 1933 (DATE)
- Reuters (ORG)
- USDA (ORG)
- 1984 to 1988 (DATE)
- America (GPE)
- Brooke Rollins (PERSON)
- USDA (ORG)
- USDA (ORG)
- hundreds (CARDINAL)
- US (GPE)
- 161 billion (MONEY)
- USDA (ORG)
- between fiscal years 2019 through 2023 (DATE)
- December (DATE)
- the US Government Accountability Office (ORG)
- Reuters (ORG)
- Joe Bidens (PERSON)
- more than 20 billion (MONEY)
- Trump (ORG)
- his first days (DATE)
- the White House (ORG)
- January 22 (DATE)
- Rollins (PERSON)
- February 20 (DATE)
- USDA (ORG)
- The White House (ORG)
- Two (CARDINAL)
- Trump (ORG)
- Trumps (PERSON)
- Congressional (NORP)
- USDA (ORG)
- Dave Walton (PERSON)
- Muscatine County Iowa (GPE)
- Trumps (PERSON)
- Walton (PERSON)
- 6000 (CARDINAL)
- USDA (ORG)
- Steve Tucker (PERSON)
- 400000 (CARDINAL)
- Agricultural Marketing Service (ORG)
- Nebraska (GPE)
- this years (DATE)
- US (GPE)
- Ed (PERSON)
- Becky Morgan (PERSON)
- years (DATE)
- USDA (ORG)
- Morgans (NORP)
- Spencer Moss (PERSON)
- the West Virginia Food and Farm Coalition (ORG)
- Charleston (GPE)
- Reuters (ORG)
- Reuters (ORG)
- The West Virginia Food and Farm Coalition (ORG)
- about 80 (CARDINAL)
- USDA (ORG)
- Moss (PERSON)
- USDA (ORG)
- Moss (PERSON)
- January 19 (DATE)
- Trump (PERSON)
- Moss (PERSON)
- Farmers (ORG)
- USAID (ORG)
- the State Department (ORG)
- Trump (ORG)
- less than 100 million (MONEY)
- roughly 40 billion (MONEY)
- USAID (ORG)
- annually (DATE)
- Reuters The Supreme Court (ORG)
- March 5 (DATE)
- US (GPE)
- 55 (CARDINAL)
- 2024 (DATE)
- a year earlier (DATE)
- United States Court (ORG)
- Jillian Blanchard (PERSON)
- Lawyers for Good Government (WORK_OF_ART)
- about 100 (CARDINAL)
- USDA (ORG)

Entities from article 38: 
- FRANKFURT (ORG)
- March 10 (DATE)
- Reuters - Ford (ORG)
- 44 billion (CARDINAL)
- 48 billion (CARDINAL)
- German (NORP)
- European (NORP)
- US (GPE)
- Monday (DATE)
- Ford-Werke German (ORG)
- 58 billion (CARDINAL)
- Ford (ORG)
- Asian (NORP)
- the United States (GPE)
- Ford (ORG)
- thousands (CARDINAL)
- Europe (LOC)
- Germany (GPE)
- Volkswagen (ORG)
- German (NORP)
- Europe (LOC)
- John Lawler (PERSON)
- Ford Motor Company (ORG)
- Europe (LOC)
- Slower (ORG)
- Europe (LOC)
- Ford (ORG)
- last year (DATE)
- European (NORP)
- the end of the decade (DATE)
- Ford-Werke (ORG)
- multi-year (DATE)
- Ford (ORG)
- German (NORP)
- 2006 (DATE)
- IG Metall (ORG)
- Fords German (PERSON)
- the coming years (DATE)
- USA (GPE)
- IG Metall (ORG)
- Fords Lawler (PERSON)
- European (NORP)
- 1 (CARDINAL)
- 09249 (DATE)

Entities from article 39: 
- HOUSTON (GPE)
- March 10 (DATE)
- Reuters - Policymakers (ORG)
- Saudi Aramco (ORG)
- Monday (DATE)
- Donald Trump (PERSON)
- US (GPE)
- Joe Biden (PERSON)
- Europe (LOC)
- Russias (PERSON)
- Ukraine (GPE)
- 2022 (DATE)
- European (NORP)
- Aramco (ORG)
- Amin Nasser (PERSON)
- CERAWeek (LOC)
- Houston (GPE)
- Nasser (PERSON)
- Elvis (PERSON)
- Deregulation (ORG)
- Aramco (ORG)
- more than 50 billion (MONEY)
- last year (DATE)
- Nasser (PERSON)
- up to 12 (CARDINAL)
- 2030 (DATE)
- last year (DATE)
- Nasser (PERSON)

Entities from article 40: 
- ABOARD AIR FORCE ONE March 9 (ORG)
- Reuters - US (ORG)
- Donald Trump (PERSON)
- Sunday (DATE)
- four (CARDINAL)
- Chinese (NORP)
- TikTok (ORG)
- ByteDance (ORG)
- January 19 (DATE)
- Trump (ORG)
- January 20 (DATE)
- 75 days (DATE)
- Trump (PERSON)
- the Air Force (ORG)
- four (CARDINAL)
- four (CARDINAL)
- TikTok (ORG)
- ByteDance (ORG)
- Reuters (ORG)
- normal business hours (TIME)
- TikTok (ORG)
- Los Angeles Dodgers (ORG)
- Frank McCourt (PERSON)
- as much as 50 billion (PERCENT)

Entities from article 41: 
- LONDON (GPE)
- March 10 (DATE)
- Reuters (ORG)
- Monday (DATE)
- London (GPE)
- US (GPE)
- John Moores (PERSON)
- LzLabs (ORG)
- IBM (ORG)
- Switzerland (GPE)
- LzLabs (ORG)
- two (CARDINAL)
- English (NORP)
- Major League Baseballs San Diego Padres (ORG)
- BMC Software (ORG)
- 1980 (DATE)
- the High Court (ORG)
- LzLabs UK (ORG)
- Winsopia (PERSON)
- IBM (ORG)
- IBM (ORG)
- 2013 (DATE)
- LzLabs (ORG)
- LzLabs (ORG)
- nearly a decade (DATE)
- the High Court (ORG)
- Finola OFarrell (PERSON)
- Winsopia (ORG)
- IBM (ORG)
- LzLabs (ORG)
- Mondays (DATE)
- last year (DATE)
- IBM (ORG)
- British (NORP)
- LzLabs Limited (ORG)
- LzLabs (ORG)
- IBM (ORG)
- LzLabs (ORG)

Entities from article 42: 
- March 12 (DATE)
- Reuters (ORG)
- five-month (DATE)
- Wednesday (DATE)
- Ukraines (ORG)
- month (DATE)
- US (GPE)
- European (NORP)
- 11 (CARDINAL)
- FTSE (ORG)
- 05 (CARDINAL)
- US (GPE)
- Ukraine (GPE)
- Kyiv (PERSON)
- US (GPE)
- Russian (NORP)
- Sergei Lavrov (PERSON)
- Wednesday (DATE)
- Ukraine (GPE)
- Moscow (GPE)
- Russian (NORP)
- October (DATE)
- Tuesday (DATE)
- 10947 (DATE)
- 10913 (DATE)
- Asia (LOC)
- seven-month (DATE)
- the previous day (DATE)
- US (GPE)
- 25 (CARDINAL)
- Wednesday (DATE)
- Asian (NORP)
- Europe (LOC)
- Asia-Pacific (LOC)
- Japan (GPE)
- Australias (LOC)
- 96 (CARDINAL)
- Hong Kong (GPE)
- HSI (ORG)
- China (GPE)
- South Korea (GPE)
- Taiwan (GPE)
- Japans Nikkei N225 (LAW)
- six-month (DATE)
- a day earlier (DATE)
- 500 (CARDINAL)
- 10 (CARDINAL)
- about 08 (CARDINAL)
- Donald Trump (PERSON)
- Canada (GPE)
- 50 (CARDINAL)
- Ontario (GPE)
- months (DATE)
- US (GPE)
- US (GPE)
- roughly 40 (CARDINAL)
- the year (DATE)
- JP Morgan (ORG)
- Bruce Kasman (PERSON)
- Singapore (GPE)
- US (GPE)
- US (GPE)
- Dicks Sporting Goods (ORG)
- DKSN (ORG)
- 57 (CARDINAL)
- Kohls Corp KSSN (ORG)
- 24 (CARDINAL)
- Travel (ORG)
- Delta Air Lines (ORG)
- half (CARDINAL)
- United UALO (ORG)
- American Airlines (ORG)
- day (DATE)
- US (GPE)
- February (DATE)
- Canada (GPE)
- Trumps (PERSON)
- seventh (ORDINAL)
- two weeks ago (DATE)
- Canadian (NORP)
- one-week (DATE)
- overnight (TIME)
- US (GPE)
- 02 (CARDINAL)
- five-month (DATE)
- 148 per dollar (MONEY)
- Australian (NORP)
- 63 US cents (MONEY)
- Brent (ORG)

Entities from article 43: 
- March 12 (DATE)
- Reuters - Volkswagen (ORG)
- Ecarx ECXO (PERSON)
- Chinese (NORP)
- Europe (LOC)
- United States (GPE)
- Wednesday (DATE)
- Volkswagen (ORG)
- Brazil (GPE)
- India (GPE)
- Ecarxs (NORP)
- Antora (PERSON)
- 1000 (DATE)
- two (CARDINAL)
- VWs Skoda-branded (PERSON)
- Europe (LOC)
- Ecarx (PERSON)
- US (GPE)
- Shen Ziyu (PERSON)
- Reuters VW (ORG)
- Western (NORP)
- Chinese (NORP)
- China (GPE)
- recent years (DATE)
- Chinese (NORP)
- German (NORP)
- Mercedes-Benz MBGnDE (ORG)
- Chinese (NORP)
- Hesais (GPE)
- Reuters (ORG)
- Tuesday (DATE)
- first (ORDINAL)
- Chinese (NORP)
- China (GPE)
- Shen (PERSON)
- more than a year (DATE)
- Volkswagen (ORG)
- 13 (CARDINAL)
- South Korean (NORP)
- LG (ORG)
- Samsung (ORG)
- Chinese (NORP)
- RD (ORG)
- Asia Shen (FAC)
- Europe (LOC)
- Volkswagen (ORG)
- Cariad (ORG)
- almost 30 (CARDINAL)
- the end of the year (DATE)
- Handelsblatt (PERSON)
- Tuesday (DATE)
- Ecarx (GPE)
- 70 (CARDINAL)
- Chinese (NORP)
- Half (CARDINAL)
- 2030 (DATE)
- Ecarx (GPE)
- RD (ORG)
- Shen (PERSON)
- Chinese (NORP)
- Chinas (ORG)
- Shen (PERSON)
- China (GPE)
- 10 (CARDINAL)
- 15 years (DATE)

Entities from article 44: 
- YORKTAIPEI (ORG)
- March 12 (DATE)
- Reuters - TSMC 2330TW (ORG)
- US (GPE)
- Nvidia (GPE)
- Advanced Micro (ORG)
- Intels INTCO (ORG)
- four (CARDINAL)
- Taiwanese (NORP)
- Intels (ORG)
- more than 50 (CARDINAL)
- Qualcomm QCOMO (PERSON)
- TSMC (ORG)
- one (CARDINAL)
- US (GPE)
- Donald Trumps (PERSON)
- TSMC (ORG)
- US (GPE)
- TSMC (ORG)
- 50 (CARDINAL)
- first (ORDINAL)
- Trump (ORG)
- Intel (ORG)
- Intel TSMC Nvidia (ORG)
- Qualcomm (PERSON)
- The White House (ORG)
- Broadcom (ORG)
- US (GPE)
- more than half (CARDINAL)
- the last year (DATE)
- Intel (ORG)
- 2024 (DATE)
- 188 billion (MONEY)
- first (ORDINAL)
- 1986 (DATE)
- 108 billion (MONEY)
- December 31 (DATE)
- Trump (ORG)
- Intels (ORG)
- American (NORP)
- three (CARDINAL)
- Taiwanese (NORP)
- Trump (ORG)
- March 3 (DATE)
- 100 billion (CARDINAL)
- the United States (GPE)
- five (CARDINAL)
- coming years (DATE)
- Intels (ORG)
- three (CARDINAL)
- TSMC (ORG)
- more than one (CARDINAL)
- Intel (ORG)
- two (CARDINAL)
- four (CARDINAL)
- US (GPE)
- Qualcomm (PERSON)
- Intel (ORG)
- Intel (ORG)
- TSMC (ORG)
- two (CARDINAL)
- Intels (ORG)
- Pat Gelsingers (PERSON)
- Intel (ORG)
- Gelsinger (PRODUCT)
- December (DATE)
- two (CARDINAL)
- AI (WORK_OF_ART)
- TSMC (ORG)
- Intel (ORG)
- two (CARDINAL)
- Intel (ORG)
- Israels Tower Semiconductor (FAC)
- two (CARDINAL)
- Taiwanese (NORP)
- Intel (ORG)
- one (CARDINAL)
- Reuters (ORG)
- last week (DATE)
- Nvidia (GPE)
- Broadcom (NORP)
- Intel (ORG)
- AMD (ORG)
- Intels (ORG)
- 18A (CARDINAL)
- 18A (DATE)
- Intel (ORG)
- TSMC (ORG)
- two (CARDINAL)
- February (DATE)
- Intel (ORG)
- TSMC (ORG)
- 18A (CARDINAL)
- 2 (CARDINAL)

Entities from article 45: 
- March 12 (DATE)
- Reuters - Cathay Pacific Airways 0293HK (ORG)
- full-year (DATE)
- Wednesday (DATE)
- Asias (NORP)
- Hong Kongs (ORG)
- 1 (CARDINAL)
- HK989 billion 127 billion (MONEY)
- the year ended December 31 (DATE)
- SmartEstimates HK849 billion (PRODUCT)
- Shares (PERSON)
- Hong Kong (GPE)
- 18 (CARDINAL)
- almost 4 (CARDINAL)
- May 2019 (DATE)
- Cathay Pacifics (ORG)
- annual (DATE)
- 12 (CARDINAL)
- HK Express (ORG)
- 23 year-on-year (DATE)
- CFO (ORG)
- Rebecca Sharpe (PERSON)
- Ronald Lam (PERSON)
- around 30 (CARDINAL)
- this year (DATE)
- COVID-19 (GPE)
- Hong Kong (GPE)
- China (GPE)
- HK Express (ORG)
- Cathay (ORG)
- 2019 (DATE)
- a full-year (DATE)
- HK400 million (ORG)
- HK433 million (CARDINAL)
- the year (DATE)
- Cathay (ORG)
- HK Express (ORG)
- Pratt Whitney (PERSON)
- Airbus (ORG)
- 2024 (DATE)
- OAG (ORG)
- HK Express (ORG)
- 2024 (CARDINAL)
- HK Express (ORG)
- Cathay (ORG)
- Based (PERSON)
- Cathay Pacific (ORG)
- Asias (NORP)
- recent years (DATE)
- China Cargo (ORG)
- 3 (CARDINAL)
- Lam (PERSON)
- Cathay (ORG)
- US (GPE)
- European Union (ORG)
- Russian (NORP)
- Ukraine Cathay (ORG)
- Russia (GPE)
- North American (NORP)
- European (NORP)
- Lam (PERSON)
- Russia (GPE)
- Russian (NORP)
- second (ORDINAL)
- annual (DATE)
- three years (DATE)
- Cathay Pacifics (ORG)
- the year (DATE)
- 105 (CARDINAL)
- Visible Alpha (PERSON)
- first (ORDINAL)
- HK100 billion (MONEY)
- 2019 (DATE)
- CFO Sharpe (ORG)
- 2023-2024 (DATE)
- HK100 billion (MONEY)
- seven years (DATE)
- Cathay (ORG)
- first (ORDINAL)
- Boeings (ORG)
- 777X (CARDINAL)
- early 2027 (DATE)
- Hong Kong (GPE)
- November (DATE)
- three (CARDINAL)
- Cathay Pacifics (ORG)
- one (CARDINAL)
- HK578 million (MONEY)
- Air China (ORG)
- 601111SS (CARDINAL)
- Air China Cargo (ORG)
- second (ORDINAL)
- HK049 (PERSON)
- Visible Alpha (PERSON)
- HK042 (ORG)
- 1 (CARDINAL)

Entities from article 46: 
- BRUSSELS (ORG)
- March 12 (DATE)
- Reuters - The European Union (ORG)
- 26 billion (MONEY)
- 28 billion (CARDINAL)
- US (GPE)
- next month (DATE)
- the European Commission (ORG)
- Wednesday (DATE)
- US (GPE)
- EU (ORG)
- US (GPE)
- Donald Trumps (PERSON)
- 25 (CARDINAL)
- Wednesday (DATE)
- The European Commission (ORG)
- US (GPE)
- April 1 (DATE)
- April 13 (DATE)
- today (DATE)
- the United States (GPE)
- 28 billion (MONEY)
- 26 billion (CARDINAL)
- European Commission (ORG)
- Ursula von (PERSON)
- Leyen (GPE)
- EU (ORG)
- EU (ORG)
- two-week (DATE)
- around 18 billion (MONEY)
- EU (ORG)
- US (GPE)
- EU (ORG)
- von der (PERSON)
- Leyen (GPE)
- 1 (CARDINAL)
- 09178 (DATE)

Entities from article 47: 
- STOCKHOLM (GPE)
- March 12 (DATE)
- Reuters - Northvolt (ORG)
- Swedish (NORP)
- Wednesday (DATE)
- Sweden (GPE)
- Europes (PERSON)
- Asian (NORP)
- EV (ORG)
- Northvolt (GPE)
- US (GPE)
- Chapter 11 (LAW)
- last November (DATE)
- Sweden (GPE)
- over 5000 (CARDINAL)
- the end of January (DATE)
- more than 8 billion (MONEY)
- nine (CARDINAL)
- Northvolt (GPE)
- Chapter 11 (LAW)
- Swedish (NORP)
- one (CARDINAL)
- Saab Automobile (ORG)
- more than a decade ago (DATE)
- Marie Nilsson (PERSON)
- the IF Metall union (ORG)
- about 1800 (CARDINAL)
- North America (LOC)
- Germany (GPE)
- Polish (NORP)
- Reuters (ORG)
- Europes (PERSON)
- Northvolt (GPE)
- Chinese (NORP)
- 300750SZ (CARDINAL)
- EV (ORG)
- BYD (ORG)
- more than 10 billion (MONEY)
- 2016 (DATE)
- Volkswagen (ORG)
- 21 (CARDINAL)
- Goldman Sachs (ORG)
- 19 (CARDINAL)
- Creditors (ORG)
- Northvolt (GPE)
- early last year (DATE)
- 5 billion (CARDINAL)
- German (NORP)
- BMW (ORG)
- 2 billion (CARDINAL)
- June of last year (DATE)
- 2020 (DATE)
- Northvolt (GPE)
- Peter Carlsson (PERSON)
- the Chapter 11 (LAW)
- November (DATE)
- up to 12 billion (MONEY)
- recent months (DATE)
- zero (CARDINAL)
- Scania (ORG)
- this week (DATE)
- Northvolt (GPE)

Entities from article 48: 
- FRANKFURT (ORG)
- March 12 (DATE)
- Reuters (ORG)
- Christine Lagarde (PERSON)
- Wednesday (DATE)
- ECB (ORG)
- 2 (CARDINAL)
- Lagarde (GPE)
- Frankfurt (GPE)
- six (CARDINAL)
- June (DATE)
- last week (DATE)
- Trump (ORG)
- the last few years (DATE)
- the last few weeks (DATE)
- Lagarde (GPE)
- only a few months ago (DATE)
- ECB (ORG)
- Lagarde (GPE)
- Lagarde (GPE)
- 2 (CARDINAL)
- 2 (CARDINAL)

Entities from article 49: 
- March 12 (DATE)
- Reuters - Rheinmetall RHMGDE (ORG)
- Europes (PERSON)
- Wednesday (DATE)
- 2025 (DATE)
- German (NORP)
- 25 to 30 (CARDINAL)
- 2025 (DATE)
- Russian (NORP)
- Ukraine (GPE)
- the United States (GPE)
- Europe (LOC)
- around 155 (CARDINAL)
- 2024 (DATE)
- 152 (CARDINAL)
- Rheinmetall (ORG)
- Europe (LOC)
- Rheinmetall (ORG)
- the coming years (DATE)
- Armin Papperger (PERSON)
- Rheinmetall (ORG)
- 810 (CARDINAL)
- 2024 (DATE)
- 570 (CARDINAL)
- the year (DATE)
- Last week (DATE)
- European (NORP)
- Ukraine (GPE)
- Donald Trumps (PERSON)
- US (GPE)
- The European Commission (ORG)
- up to 800 billion (MONEY)
- 863 billion (MONEY)
- European (NORP)
- 150 billion (CARDINAL)
- 2024 (DATE)
- 975 billion (MONEY)
- 36 (CARDINAL)
- the year (DATE)
- the 999 billion (MONEY)
- Vara Research While (ORG)
- 80 (CARDINAL)
- Rheinmetall (ORG)
- last month (DATE)
- two (CARDINAL)
- Germany (GPE)
- 1 09174 (CARDINAL)

Entities from article 50: 
- March 12 (DATE)
- Reuters (ORG)
- Australian (NORP)
- Dick Friend (PERSON)
- Tesla TSLAO (ORG)
- 2015 (DATE)
- two (CARDINAL)
- Tesla (NORP)
- one (CARDINAL)
- last year (DATE)
- Donald Trump (PERSON)
- Friend (PERSON)
- Hobart (PERSON)
- Melbourne Tesla (PERSON)
- the four months (DATE)
- Trumps (PERSON)
- 35 (CARDINAL)
- the same time (DATE)
- last year (DATE)
- Australias Electric Vehicle Council (ORG)
- Australia (GPE)
- New Zealand (GPE)
- the last week (DATE)
- Trump (PERSON)
- Tesla (NORP)
- Tesla (NORP)
- Trumps (PERSON)
- 2024 17 (DATE)
- a year earlier (DATE)
- Europe (LOC)
- Europe Reuters (GPE)
- last week (DATE)
- Tesla (NORP)
- Australia (GPE)
- New Zealand (GPE)
- the last week (DATE)
- US (GPE)
- Tuesday (DATE)
- Tesla (ORG)
- the White House (FAC)
- Tesla (NORP)
- Australian (NORP)
- Tasmania a Tesla (ORG)
- last week (DATE)
- graffiti (PERSON)
- Nazi (NORP)
- Tesla (ORG)
- Australia (GPE)
- recent months (DATE)
- 25 years (DATE)
- the past six months (DATE)
- one (CARDINAL)
- Teslas (PERSON)
- New Zealand (GPE)
- Wednesday (DATE)
- 52-year-old (DATE)
- Tuesday (DATE)
- evening (TIME)
- Teslas (PERSON)
- Auckland Declines (ORG)
- Tesla (NORP)
- New Zealand (GPE)
- the last year (DATE)
- the Motor Industry of New Zealand (ORG)

Entities from article 51: 
- March 12 (DATE)
- Reuters (ORG)
- 40 (CARDINAL)
- US (GPE)
- this year (DATE)
- US (GPE)
- JP (ORG)
- Morgans (NORP)
- US (GPE)
- Bruce Kasman (PERSON)
- US (GPE)
- Singapore (GPE)
- Wednesday (DATE)
- roughly 40 (CARDINAL)
- about a (CARDINAL)
- 30 (CARDINAL)
- the start of the year (DATE)
- JP Morgans (ORG)
- 2 (CARDINAL)
- US (GPE)
- year (DATE)
- US (GPE)
- months (DATE)
- recent days (DATE)
- Donald Trump (PERSON)
- Ninety-five percent (PERCENT)
- Reuters (ORG)
- last week (DATE)
- Canada Mexico (GPE)
- US (GPE)
- Trumps (PERSON)
- Goldman Sachs (ORG)
- Morgan Stanley (ORG)
- last week (DATE)
- US (GPE)
- 17 (CARDINAL)
- 15 this year (DATE)
- Kasman (ORG)
- 50 (CARDINAL)
- Trump (ORG)
- April (DATE)
- Kasman (ORG)
- US (GPE)
- US (GPE)
- US (GPE)
- US (GPE)
- last week (DATE)
- Kasman (PERSON)
- US (GPE)
- this year (DATE)

Entities from article 52: 
- March 12 (DATE)
- Reuters - Chinese (ORG)
- this week (DATE)
- US (GPE)
- Chinese (NORP)
- US (GPE)
- China (GPE)
- CCTV (ORG)
- Wednesday (DATE)
- Weibo (ORG)
- Yuyuantantian (GPE)
- CCTV (ORG)
- Chinas commerce ministry (ORG)
- Walmart (ORG)
- March 11 (DATE)
- Last week (DATE)
- Bloomberg News (ORG)
- Chinese (NORP)
- as much as 10 (PERCENT)
- US (GPE)
- Donald Trump (PERSON)
- Chinas commerce ministry (ORG)

Entities from article 53: 
- NEW DELHI (GPE)
- March 12 (DATE)
- Reuters - Jaguar Land Rover (ORG)
- Tata Motors (ORG)
- 1 billion (CARDINAL)
- India (GPE)
- four (CARDINAL)
- British (NORP)
- EV (ORG)
- three (CARDINAL)
- India (GPE)
- JLR (ORG)
- about two months (DATE)
- Chinese (NORP)
- EV (ORG)
- Tata Passenger Electric Mobility Tatas (ORG)
- first (ORDINAL)
- Avinya (GPE)
- Tata (ORG)
- September (DATE)
- 250000 (CARDINAL)
- about 5-7 years (DATE)
- JLR (ORG)
- more than 70000 (CARDINAL)
- Tatas EV (ORG)
- 25000 (CARDINAL)
- Tata (ORG)
- Reuters (ORG)
- Tamil Nadu (PERSON)
- Tata (ORG)
- Tata (ORG)
- EV (ORG)
- JSW MG Motor (ORG)
- Mahindra (ORG)
- Mahindra MAHMNS (ORG)
- Tesla TSLAO (ORG)
- India (GPE)
- third (ORDINAL)
- 4 million (CARDINAL)
- EV (ORG)
- about 2 (CARDINAL)
- November (DATE)
- JLR (ORG)
- Mumbai (GPE)
- JLR (ORG)
- Britain (GPE)
- Europe (LOC)
- China (GPE)
- Tatas (ORG)
- Pune (LOC)
- Maharashtra Tatas EV (ORG)
- the end of January (DATE)
- two (CARDINAL)
- Tata (ORG)
- January (DATE)
- Avinya EV (ORG)
- 2026-2027 (DATE)
- this year (DATE)
- Tata (ORG)

Entities from article 54: 
- Spain (GPE)
- March 12 (DATE)
- Reuters - Zara (ORG)
- Inditex ITXMC (ORG)
- Wednesday (DATE)
- first quarter (DATE)
- February 1 (DATE)
- more than 8 (CARDINAL)
- Strong (GPE)
- Zara (PERSON)
- the past three years (DATE)
- Inditex (ORG)
- just 4 (CARDINAL)
- the February 1 to March 10 (EVENT)
- 11 (CARDINAL)
- a year ago (DATE)
- 88 (CARDINAL)
- the first quarter (DATE)
- Bernstein (PERSON)
- William Woods (PERSON)
- Inditex (ORG)
- the United States (GPE)
- second (ORDINAL)
- Spain (GPE)
- China (GPE)
- Mexico (GPE)
- Canada (GPE)
- US (GPE)
- Oscar Garcia Maceiras (PERSON)
- Maceiras (ORG)
- the past two years (DATE)
- the start of the season (DATE)
- the year ahead (DATE)
- 105 (CARDINAL)
- full-year (DATE)
- 386 billion (MONEY)
- The key holiday shopping quarter (DATE)
- 112 billion (MONEY)
- Inditex (ORG)
- Xavier Brun (PERSON)
- Madrid (GPE)
- Trea Asset Management (ORG)
- Inditex (ORG)
- the coming quarters (DATE)
- 2025 (DATE)
- Inditex (ORG)
- 2024 (DATE)
- 9 (CARDINAL)
- 59 billion (CARDINAL)
- the Bershka PullBear Massimo Dutti Stradivarius (ORG)
- Oysho (PERSON)
- 9 (CARDINAL)
- 168 (CARDINAL)
- Inditex (ORG)
- 18 billion (CARDINAL)
- this year (DATE)
- 214 (CARDINAL)
- first (ORDINAL)
- Iraq (GPE)
- this year (DATE)
- Bershka (ORG)
- Sweden (GPE)
- Oysho (PERSON)
- first (ORDINAL)
- Netherlands (GPE)
- Germany Inditex (ORG)
- Zacaffe (NORP)
- Zara (GPE)
- Madrid (GPE)

Entities from article 55: 
- WASHINGTON (GPE)
- March 11 (DATE)
- Reuters - US (ORG)
- Donald Trump (PERSON)
- Tuesday (DATE)
- Americas (LOC)
- Republican (NORP)
- about 100 (CARDINAL)
- the Business Roundtable (ORG)
- Apple (ORG)
- JPMorgan Chase JPMN (ORG)
- Walmart (ORG)
- Trump (PERSON)
- the White House (ORG)
- Monday (DATE)
- US (GPE)
- Tuesday (DATE)
- 500 (CARDINAL)
- 53 (CARDINAL)
- 2025 (DATE)
- Mondays (DATE)
- this year (DATE)
- the weekend (DATE)
- Trump (ORG)
- Tuesday (DATE)
- Trump (ORG)
- Trump (ORG)
- Trump (PERSON)
- Trump (PERSON)
- 15 (CARDINAL)
- US (GPE)
- Trump (ORG)
- a few weeks ago (DATE)
- Chinese (NORP)
- Xi Jinping (PERSON)
- Trumps (PERSON)
- Trump (ORG)
- Tuesday (DATE)
- Canada (GPE)
- hours (TIME)
- Americas (LOC)
- 50 (CARDINAL)
- The White House (ORG)
- 25 (CARDINAL)
- Canadian (NORP)
- Trumps (PERSON)
- The White House (ORG)
- Trump (ORG)
- the United States Markets (ORG)
- Trump (PERSON)
- Trump (ORG)
- first (ORDINAL)
- 2021 (DATE)
- 2020 (DATE)
- 2024 (DATE)
- Trump (PERSON)
- an additional 20 (CARDINAL)
- Chinese (NORP)
- the United States (GPE)
- 25 (CARDINAL)
- Canada (GPE)
- Mexico (GPE)
- US (GPE)
- April 2 (DATE)
- Trumps (PERSON)
- Larry Fink (PERSON)
- Business Roundtable (ORG)
- Monday Last week (DATE)
- the Business Roundtable (ORG)

Entities from article 56: 
- March 24 (DATE)
- Reuters - Cloud (ORG)
- Dropbox DBXO (PERSON)
- the Wall Street Journal (ORG)
- Monday (DATE)
- Dropbox (PERSON)
- Half (CARDINAL)
- Reuters (ORG)
- Half (CARDINAL)
- Moon (PERSON)
- Dropboxs (ORG)
- Drew Houston (PERSON)
- Half Moon Capital (ORG)
- around 40000 (CARDINAL)
- Dropbox (ORG)
- about 11 million (CARDINAL)
- WSJ (ORG)
- Houston (GPE)
- roughly 77 (CARDINAL)
- 10 (CARDINAL)
- Half (CARDINAL)
- Moon Capital (ORG)
- annual (DATE)
- October 2024 (DATE)
- 20 (CARDINAL)
- 16 (CARDINAL)
- 2023 (DATE)

Entities from article 57: 
- March 25 (DATE)
- Reuters - Chinas Xiaomi Corp (ORG)
- Tuesday (DATE)
- 55 billion (MONEY)
- 800 million (CARDINAL)
- HK5325 (FAC)
- the Hong Kong Stock Exchange (ORG)
- third (ORDINAL)
- last year (DATE)
- 750 million (CARDINAL)
- HK5280 (PRODUCT)
- Monday (DATE)
- 66 (CARDINAL)
- Xiaomis (NORP)
- Monday 1 77742 (DATE)

Entities from article 58: 
- ORLANDO Florida March (ORG)
- Reuters (ORG)
- third (ORDINAL)
- this year (DATE)
- week of the quarter (DATE)
- Monday (DATE)
- Trump (ORG)
- April 2 (DATE)
- US (GPE)
- Big Tech Europe (ORG)
- Asian (NORP)
- US (GPE)
- Bond (ORG)
- a third day (DATE)
- November (DATE)
- 10 (CARDINAL)
- 500 (CARDINAL)
- 4 (CARDINAL)
- November 2022 (DATE)
- 12 (CARDINAL)
- Tesla (NORP)
- US (GPE)
- Ill (PERSON)
- US (GPE)
- first (ORDINAL)
- Mondays (DATE)
- Trump (ORG)
- Monday (DATE)
- the next few days (DATE)
- Washington (GPE)
- Trump (ORG)
- 25 (CARDINAL)
- Venezuela (GPE)
- Brent (ORG)
- more than 1 (CARDINAL)
- Monday (DATE)
- three weeks (DATE)
- fourth (ORDINAL)
- daily (DATE)
- Monday (DATE)
- Atlanta Federal Reserve (ORG)
- Raphael Bostic (PERSON)
- Monday (DATE)
- one quarter (DATE)
- this year (DATE)
- Bostic (ORG)
- Fed (ORG)
- this year (DATE)
- Feds (PERSON)
- 19 (CARDINAL)
- last weeks (DATE)
- Asia (LOC)
- Beijings (ORG)
- Chinas (ORG)
- He Lifeng (PERSON)
- Apple (ORG)
- Mastercard Cargill (PERSON)
- Sunday (DATE)
- China (GPE)
- Beijing (GPE)
- September (DATE)
- Chinas (ORG)
- the quarter (DATE)
- US (GPE)
- the first quarter (DATE)
- US (GPE)
- US (GPE)
- US (GPE)
- 28 billion (MONEY)
- January (DATE)
- US (GPE)
- 748 billion (MONEY)
- Treasury International Capital (ORG)
- US (GPE)
- a single month (DATE)
- monthly (DATE)
- US (GPE)
- a year (DATE)
- US (GPE)
- 15 (CARDINAL)
- the past few weeks (DATE)
- one month (DATE)
- many more months (DATE)
- US (GPE)
- recent years (DATE)
- TIC (ORG)
- US (GPE)
- last year (DATE)
- 980 billion (MONEY)
- 668 billion (MONEY)
- the year before (DATE)
- 16 trillion (CARDINAL)
- 2022 (DATE)
- US (GPE)
- the last three calendar years (DATE)
- 325 trillion (MONEY)
- US (GPE)
- the end of last year (DATE)
- 18 (CARDINAL)
- US (GPE)
- Goldman Sachs (ORG)
- 1945 (DATE)
- more than 1 trillion (MONEY)
- Goldman Sachs (ORG)
- US (GPE)
- David Kostin (PERSON)
- US (GPE)
- this year (DATE)
- US (GPE)
- this year (DATE)
- last year (DATE)
- 300 billion (CARDINAL)
- 304 billion (CARDINAL)
- 2024 (DATE)
- US (GPE)
- US (GPE)
- G10 FX (ORG)
- Steven Englander Cyclical (PERSON)
- US (GPE)
- TIC (ORG)
- recent weeks (DATE)
- February (DATE)
- March (DATE)
- US (GPE)
- recent weeks (DATE)
- Chinas DeepSeek (ORG)
- Germanys (GPE)
- Trump (ORG)
- The next few months (DATE)
- tomorrow (DATE)
- today (DATE)
- today (DATE)
- ReutersJamie (ORG)
- Reuters News (ORG)
- Trading Day (DATE)
- morning (TIME)

Entities from article 59: 
- WASHINGTON (GPE)
- March 24 (DATE)
- Reuters - Boeing (ORG)
- two (CARDINAL)
- 737 (CARDINAL)
- MAX (ORG)
- the Wall Street Journal (ORG)
- Monday (DATE)
- US (GPE)
- December (DATE)
- Boeing (ORG)
- the Justice Department (ORG)
- Donald Trump (PERSON)
- January 20 (DATE)
- Boeing (ORG)
- The Justice Department (ORG)
- July (DATE)
- Boeing (ORG)
- two (CARDINAL)
- 737 (CARDINAL)
- MAX (ORG)
- 4872 million (CARDINAL)
- 455 million (CARDINAL)
- three years (DATE)
- two (CARDINAL)
- 737 (CARDINAL)
- MAX (ORG)
- 2018 (DATE)
- 2019 (DATE)
- 346 (CARDINAL)
- Boeing (ORG)
- Boeing (ORG)
- the US Federal Aviation Administration (ORG)
- May (DATE)
- DOJ (ORG)
- Boeing (ORG)
- 2021 (DATE)
- Boeing (ORG)
- January 5 2024 (DATE)
- Alaska Airlines (ORG)
- Boeing (ORG)
- Reed OConnor (PERSON)
- Fort Worth (GPE)
- Texas (GPE)
- 2023 (DATE)
- Boeings (ORG)
- US (GPE)
- Trump (ORG)
- Boeing (ORG)
- Boeing (ORG)
- Steve Bradbury (PERSON)
- Boeing (ORG)
- Friday (DATE)
- the US Air Forces (ORG)

Entities from article 60: 
- March 24 (DATE)
- Reuters (ORG)
- Brad Lightcap (PERSON)
- Sam Altman (PERSON)
- Monday (DATE)
- Microsoft (ORG)
- AI industry Altman (ORG)
- OpenAI (ORG)
- SoftBank Group 9984 (ORG)
- Oracle ORCLN (ORG)
- 500-billion (MONEY)
- Brad (PERSON)
- Lightcap (PERSON)
- Altman (GPE)
- OpenAI (ORG)
- 2018 (DATE)
- OpenAI (GPE)
- Mark Chen (PERSON)
- post Altman (ORG)
- February (DATE)
- OpenAI (ORG)
- San Francisco (GPE)
- OpenAI (ORG)
- 40 billion (CARDINAL)
- AI (ORG)
- 66 billion (CARDINAL)
- October (DATE)

Entities from article 61: 
- March 24 (DATE)
- Reuters (ORG)
- US (GPE)
- 74 (CARDINAL)
- February (DATE)
- the same period a year earlier (DATE)
- the Equipment Leasing and Finance Association (ORG)
- Monday (DATE)
- February (DATE)
- 97 billion (CARDINAL)
- 98 billion (CARDINAL)
- the year-ago (DATE)
- the last few months (DATE)
- Leigh Lytle (PERSON)
- Washington (GPE)
- US (GPE)
- 75 (CARDINAL)
- February (DATE)
- the previous year (DATE)
- The Equipment Leasing Finance Foundation (ORG)
- March (DATE)
- 581 (CARDINAL)
- the next four months (DATE)
- 25 (CARDINAL)
- Bank of America (ORG)
- Caterpillar (ORG)
- Dell Technologies DELLN (ORG)
- Siemens AG (ORG)
- Canon (LAW)
- Volvo AB (ORG)

Entities from article 62: 
- NEW YORK (GPE)
- March 24 (DATE)
- Reuters - JPMorgan Chase (ORG)
- Walmart (ORG)
- Walmarts (ORG)
- JPMorgans (ORG)
- Lia Cao (PERSON)
- US (GPE)
- Walmart (ORG)
- more than 700 million (CARDINAL)
- 100000 (CARDINAL)
- Marketplace (ORG)
- 40 (CARDINAL)
- the fourth quarter (DATE)
- More than 2 trillion (MONEY)
- McKinsey (PERSON)
- JPMorgan (ORG)
- Cao (ORG)
- over 20 (CARDINAL)
- the next year (DATE)
- Walmart (ORG)
- US (GPE)
- Europe (LOC)
- Cao (ORG)

Entities from article 63: 
- March 24 (DATE)
- Reuters - Social (ORG)
- platform Xs (PERSON)
- Haofei Wang (PERSON)
- Monday (DATE)

Entities from article 64: 
- WASHINGTON (GPE)
- March 24 (DATE)
- Reuters (ORG)
- US (GPE)
- 366 trillion (MONEY)
- mid-July (DATE)
- early October (DATE)
- Congress (ORG)
- Washingtons (ORG)
- the Bipartisan Policy Center (ORG)
- Monday Lawmakers (WORK_OF_ART)
- Congresss (NORP)
- Congressional Budget Office (ORG)
- Wednesday (DATE)
- the Treasury Department (ORG)
- Shai Akabas (PERSON)
- BPC (ORG)
- the X Date (ORG)
- US (GPE)
- 2023 (DATE)
- US (GPE)
- mid-April (DATE)
- Americans (NORP)
- annual (DATE)
- This year (DATE)
- US (GPE)
- Donald Trump (PERSON)
- BPC (LOC)
- quarterly (DATE)
- June 15 (DATE)

Entities from article 65: 
- March 24 (DATE)
- Reuters - South Koreas Hyundai Motor Group (ORG)
- 21 billion (CARDINAL)
- the United States (GPE)
- Donald Trump (PERSON)
- the White House (FAC)
- Monday (DATE)
- 58 billion (MONEY)
- Hyundai Steel (ORG)
- Louisiana (GPE)
- over 27 million metric tons (QUANTITY)
- annually (DATE)
- more than 1400 (CARDINAL)
- Alabama (GPE)
- Georgia (GPE)
- Hyundai (ORG)
- 9 billion (MONEY)
- 2028 (DATE)
- US (GPE)
- 12 million (CARDINAL)
- the United States (GPE)
- 6 billion (CARDINAL)
- US (GPE)
- Hyundai Motor (ORG)
- 759 billion (MONEY)
- Georgia (GPE)
- Wednesday (DATE)
- Alabama (GPE)
- Kia (ORG)
- Georgia (GPE)
- two (CARDINAL)
- 700000 (CARDINAL)
- a year (DATE)
- Georgia (GPE)
- 300000 (CARDINAL)
- Louisiana (GPE)
- Jeff Landry (PERSON)
- South Korea (GPE)
- October (DATE)
- Hyundai (ORG)
- Trump (ORG)
- South Korean (NORP)
- 3 billion (MONEY)
- LNG (ORG)
- the United States (GPE)
- Trumps (PERSON)
- the White House (ORG)
- Hyundai (ORG)
- 2022 (DATE)
- about 10 billion (MONEY)
- 2025 (DATE)
- the United States (GPE)
- Trump (ORG)
- April 2 (DATE)
- South Korea (GPE)
- the United States (GPE)
- Trump (ORG)
- Monday (DATE)
- this week (DATE)
- Trump (ORG)
- US (GPE)
- 25 (CARDINAL)
- hundreds (CARDINAL)
- Automakers (GPE)
- the White House (ORG)
- GM (ORG)
- Mary Barra (PERSON)
- Trump (ORG)
- earlier this month (DATE)
- 60 billion (MONEY)
- the United States (GPE)

Entities from article 66: 
- March 24 (DATE)
- Reuters - Tesla (ORG)
- 10 (CARDINAL)
- Monday (DATE)
- this years (DATE)
- Trump (PERSON)
- April 2 (DATE)
- 40 (CARDINAL)
- this year (DATE)
- EV (ORG)
- three (CARDINAL)
- US (GPE)
- two-week (DATE)
- Tesla (PERSON)
- one-day (DATE)
- November 6 (DATE)
- Trump (ORG)
- US (GPE)
- about 870 billion (MONEY)
- 154 trillion (MONEY)
- December (DATE)
- the first quarter of the year (DATE)
- EV (ORG)
- Last Thursday (DATE)
- late-night (TIME)
- Tesla (NORP)
- platform X Tesla (PERSON)
- 19 (CARDINAL)
- 11 (CARDINAL)
- JPMorgan (ORG)
- EV (ORG)
- last week (DATE)
- two-day (DATE)
- Dennis Dick (PERSON)
- Dick (PERSON)
- Tesla (NORP)
- Teslas (PRODUCT)
- 85 (CARDINAL)
- Ford (ORG)
- General Motors GMN (ORG)
- LSEG Teslas Chinese (PERSON)
- BYD (ORG)
- 73 (CARDINAL)
- fourth-quarter (DATE)
- Monday (DATE)
- annual (DATE)
- 2024 (DATE)
- 100 billion mark (MONEY)
- US (GPE)
- Tesla (ORG)
- today (DATE)
- Chinese (NORP)
- BYD (PERSON)
- SP (ORG)
- Danni Hewson (PERSON)
- AJ Bell (ORG)

Entities from article 67: 
- March 24 (DATE)
- Reuters - Massachusetts (ORG)
- Robinhoods HOODO (PERSON)
- March Madness (ORG)
- Massachusetts (GPE)
- State (ORG)
- Bill Galvin (PERSON)
- Reuters (ORG)
- Monday (DATE)
- Robinhood (PRODUCT)
- Galvin a (PERSON)
- Democrat (NORP)
- Robinhood (PRODUCT)
- last week (DATE)
- NCAA (ORG)
- Robinhood (PRODUCT)
- Robinhood (PRODUCT)
- the US Commodity Futures Trading Commission (ORG)
- CFTC (ORG)
- one (CARDINAL)
- first (ORDINAL)
- Robinhood (PRODUCT)
- Menlo Park California (GPE)
- Robinhood (ORG)
- US (GPE)
- March 17 came a month (DATE)
- Robinhood (PRODUCT)
- CFTC (ORG)
- the Super Bowl (EVENT)
- Robinhood (PRODUCT)
- NCAA (ORG)
- CFTC (ORG)
- recent weeks (DATE)
- CFTC (ORG)
- Robinhood (PRODUCT)
- CFTC (ORG)
- Robinhood (PRODUCT)
- Galvins (GPE)
- CFTCs (ORG)
- February (DATE)
- Republican (NORP)
- Donald Trumps (PERSON)
- CFTC (ORG)
- Brian Quintenz (PERSON)
- Kalshis (NORP)
- Donald Trump Jr (PERSON)
- Kalshi (PERSON)
- Galvin and Robinhood (ORG)
- 2020 (DATE)
- Robinhood (PRODUCT)
- Robinhood (PRODUCT)
- 2024 (DATE)
- 75 million (CARDINAL)
- Galvins (PERSON)
- 2021 (DATE)

Entities from article 68: 
- March 24 (DATE)
- Reuters - The SP 500 (ORG)
- over two weeks (DATE)
- Monday (DATE)
- Nvidia (GPE)
- Tesla (ORG)
- Trump (PERSON)
- US (GPE)
- US (GPE)
- Donald Trump (PERSON)
- Nvidia NVDAO (ORG)
- 3 (CARDINAL)
- Advanced Micro (ORG)
- 7 (CARDINAL)
- PHLX (ORG)
- SOX (ORG)
- 3 (CARDINAL)
- Tesla TSLAO (ORG)
- almost 12 (CARDINAL)
- one-day (DATE)
- early November (DATE)
- US (GPE)
- recent weeks (DATE)
- Trump (ORG)
- last month (DATE)
- US (GPE)
- China (GPE)
- Mexico (GPE)
- Canada (GPE)
- about 4 (CARDINAL)
- March 13 (DATE)
- 6 (CARDINAL)
- February 19 (DATE)
- Sam Stovall (PERSON)
- LSEG (PERSON)
- Friday (DATE)
- 105 (CARDINAL)
- 2025 (DATE)
- 35 (CARDINAL)
- the beginning of the year (DATE)
- 500 (CARDINAL)
- 176 (CARDINAL)
- 576757 (DATE)
- Nasdaq (ORG)
- 227 (CARDINAL)
- 1818859 (DATE)
- 142 (CARDINAL)
- 4258332 (CARDINAL)
- Russell (PERSON)
- 255 (CARDINAL)
- two-week (DATE)
- CBOE (ORG)
- 18 (CARDINAL)
- one-month (DATE)
- US (GPE)
- 136 billion (CARDINAL)
- 165 billion (CARDINAL)
- 20 (CARDINAL)
- Ten (CARDINAL)
- 11 (CARDINAL)
- 407 (CARDINAL)
- Tesla (ORG)
- 21 (CARDINAL)
- US (GPE)
- March (DATE)
- this week (DATE)
- the Personal Consumption Expenditure (ORG)
- Federal Reserves (ORG)
- Friday (DATE)
- Dun Bradstreet (ORG)
- 3 (CARDINAL)
- Clearlake Capital (ORG)
- 77 billion (CARDINAL)
- Lockheed (ORG)
- Martin (PRODUCT)
- Global Research (ORG)
- Crypto (PERSON)
- 4 (CARDINAL)
- MicroStrategy (ORG)
- 10 (CARDINAL)
- Coinbase COINO (ORG)
- 7 (CARDINAL)
- 500 (CARDINAL)
- 54 (CARDINAL)
- 500 (CARDINAL)
- 5 (CARDINAL)
- 1 (CARDINAL)
- Nasdaq (ORG)
- 46 (CARDINAL)
- 97 (CARDINAL)

Entities from article 69: 
- WASHINGTON (GPE)
- March 24 (DATE)
- Reuters - US (ORG)
- March (DATE)
- the year (DATE)
- SP Global (ORG)
- Monday (DATE)
- this month (DATE)
- nearly two years (DATE)
- Companies (ORG)
- Donald Trump (PERSON)
- the White House (ORG)
- next month (DATE)
- Trump (PERSON)
- thousands (CARDINAL)
- Chris Williamson (PERSON)
- SP Global Market Intelligence (ORG)
- SP Globals (ORG)
- US (GPE)
- PMI (ORG)
- 535 (CARDINAL)
- this month (DATE)
- 516 (CARDINAL)
- February (DATE)
- 50 (CARDINAL)
- March 12-21 (DATE)
- PMI (ORG)
- two straight months (DATE)
- PMI (ORG)
- the first quarter (DATE)
- Gross (PERSON)
- the first quarter (DATE)
- 15 (CARDINAL)
- 23 (CARDINAL)
- October-December quarter (DATE)
- The Federal Reserve (ORG)
- last week (DATE)
- 17 (CARDINAL)
- this year (DATE)
- 21 (CARDINAL)
- December (DATE)
- US (GPE)
- the Personal Consumption Expenditures (ORG)
- 28 (CARDINAL)
- this year (DATE)
- 25 (CARDINAL)
- Fed (ORG)
- 2 (CARDINAL)
- overnight (TIME)
- 425 (CARDINAL)
- second (ORDINAL)
- 2022 (DATE)
- 609 (CARDINAL)
- April 2023 (DATE)
- 584 (CARDINAL)
- February (DATE)
- August 2022 (DATE)
- Services (ORG)
- 536 (CARDINAL)
- 523 (CARDINAL)
- last month (DATE)
- Feds (NORP)
- Williamson (PERSON)
- 533 (CARDINAL)
- this month (DATE)
- 519 (CARDINAL)
- February (DATE)
- 506 (CARDINAL)
- 494 (CARDINAL)
- January (DATE)
- PMI (ORG)
- 498 (CARDINAL)
- 527 (CARDINAL)
- February (DATE)
- Reuters (ORG)
- PMI (ORG)
- 517 (CARDINAL)
- PMI (ORG)
- 543 (CARDINAL)
- 510 (CARDINAL)
- last month (DATE)
- PMI (ORG)
- 508 (CARDINAL)

Entities from article 70: 
- FRANKFURT (ORG)
- March 25 (DATE)
- Reuters - US (ORG)
- Donald Trumps (PERSON)
- Europe (LOC)
- Europhoria (ORG)
- Trumps (PERSON)
- Americas (LOC)
- Europe (LOC)
- Germanys (GPE)
- hundreds of billions (MONEY)
- European (NORP)
- US (GPE)
- European (NORP)
- Make Europe Great Again (FAC)
- Trumps MAGA (PERSON)
- Europes (PERSON)
- the United States (GPE)
- Europhoria (ORG)
- Holger Schmieding (PERSON)
- German (NORP)
- Berenberg (PERSON)
- Europe (LOC)
- Euro (LAW)
- 12 (CARDINAL)
- Trumps (PERSON)
- January 20 (DATE)
- US (GPE)
- 67 (CARDINAL)
- US (GPE)
- European (NORP)
- Reuters (ORG)
- 2026 (DATE)
- first (ORDINAL)
- nearly a year (DATE)
- 13 (CARDINAL)
- 12 (CARDINAL)
- 2 (CARDINAL)
- the United States (GPE)
- Monday (DATE)
- seven months (DATE)
- European (NORP)
- Angelique Renkhoff-Muecke (PERSON)
- US (GPE)
- the United States (GPE)
- April 2 (DATE)
- Europes The European Central Bank (ORG)
- 25 (CARDINAL)
- US (GPE)
- Europe (LOC)
- about 03 (CARDINAL)
- the first year (DATE)
- Europe (LOC)
- about half (CARDINAL)
- Europe (LOC)
- Trumps (PERSON)
- Atanas Kolev (ORG)
- one (CARDINAL)
- Germanys (GPE)
- Rheinmetall (ORG)
- Europes (PERSON)
- 2025 (DATE)
- MBDA (ORG)
- Italian (NORP)
- Germanys Heidelberg Materials (ORG)
- Austrias Strabag STRVVI (PERSON)
- Swiss (NORP)
- Frances SPIE SPIEPA (PERSON)
- German (NORP)
- Peter Huebner (PERSON)
- Germanys (GPE)
- HDB (ORG)
- Strabags German (NORP)
- this year (DATE)
- two and a half (DATE)
- HDB (ORG)
- Reuters (ORG)
- 2025 (DATE)
- first (ORDINAL)
- five years (DATE)
- January (DATE)
- 14 this year (DATE)
- years (DATE)
- Money (PRODUCT)
- Stefan Rauber (PERSON)
- German (NORP)
- Saarstahl Klaus Adam (PERSON)
- University College London (ORG)
- 32-year-old (DATE)
- Stability (ORG)

Entities from article 71: 
- HONG KONG (GPE)
- March 25 (DATE)
- Reuters - Alibaba Group (ORG)
- 9988HK (CARDINAL)
- Joe Tsai (PERSON)
- Tuesday (DATE)
- Xi Jinpings (PERSON)
- February (DATE)
- US (GPE)
- Tsai (PERSON)
- Xi (PERSON)
- Chinese (NORP)
- Alibaba (GPE)
- Jack Ma (PERSON)
- Beijings (GPE)
- four years ago (DATE)
- US (GPE)
- second (ORDINAL)
- Tsai (PERSON)
- Global Investment Summit (ORG)
- Hong Kong (GPE)
- Alibabas (PERSON)
- the past 12 quarters (DATE)
- Chinas (ORG)
- the past few years (DATE)
- Tsai (PERSON)
- Guo Shan (PERSON)
- Hutong Research (ORG)
- Alibaba (GPE)
- Chinas 13 million (LAW)
- every year (DATE)
- roughly a quarter (CARDINAL)
- DeepSeek (PRODUCT)
- Chinese (NORP)
- AI (ORG)
- DeepSeek (PRODUCT)
- Chinas (ORG)
- Alibaba (GPE)
- Tsai (PERSON)
- the United States (GPE)
- about 500 billion several hundred billion dollars (MONEY)
- today (DATE)
- at least 380 billion yuan 52 billion (MONEY)
- the next three years (DATE)
- Alibaba (GPE)
- 24 this year (DATE)
- Xis (PERSON)
- DeepSeeks (PRODUCT)

Entities from article 72: 
- March 25 (DATE)
- Reuters - Smithfield Foods SFDO (ORG)
- annual (DATE)
- Tuesday (DATE)
- first (ORDINAL)
- quarterly (DATE)
- January (DATE)
- WH Group (ORG)
- Smithfield (GPE)
- Virginia (GPE)
- Reuters (ORG)
- January (DATE)
- US (GPE)
- Vernon California (FAC)
- Charlotte North Carolina (GPE)
- 2023 (DATE)
- nearly 59 (CARDINAL)
- 22 (CARDINAL)
- a year earlier (DATE)
- the fourth quarter (DATE)
- fiscal 2025 (DATE)
- between 110 billion and 130 billion (MONEY)
- the 12 months ended December 29 2024 (DATE)
- Smithfield (PERSON)
- Donald Trumps (PERSON)
- last week (DATE)
- US (GPE)
- Smithfield (ORG)
- annual (DATE)
- 34 (CARDINAL)
- fiscal 2024 (DATE)
- the fourth quarter (DATE)
- 12 (CARDINAL)
- 395 billion (MONEY)
- 54 cents (MONEY)
- 25 cents (MONEY)
- a year earlier (DATE)

Entities from article 73: 
- NEW DELHI (GPE)
- March 25 (DATE)
- Reuters - India (ORG)
- more than half (CARDINAL)
- US (GPE)
- 23 billion (MONEY)
- first (ORDINAL)
- two (CARDINAL)
- two (CARDINAL)
- years (DATE)
- South Asian (NORP)
- US (GPE)
- Donald Trumps (PERSON)
- April 2 (DATE)
- Western (NORP)
- New Delhi (GPE)
- 87 (CARDINAL)
- the United States (GPE)
- 66 billion (MONEY)
- two (CARDINAL)
- Reuters (ORG)
- India (GPE)
- 55 (CARDINAL)
- US (GPE)
- 5 (CARDINAL)
- 30 (CARDINAL)
- India (GPE)
- more than 23 billion (MONEY)
- the United States (GPE)
- one (CARDINAL)
- US (GPE)
- about 22 (CARDINAL)
- the World Trade Organization (ORG)
- 12 (CARDINAL)
- The United States (GPE)
- 456 billion (MONEY)
- India (GPE)
- Narendra Modis (PERSON)
- US (GPE)
- February (DATE)
- two (CARDINAL)
- New Delhi (GPE)
- US (GPE)
- South (LOC)
- Central Asia Brendan Lynch (ORG)
- United States (GPE)
- Tuesday (DATE)
- Indian (NORP)
- more than half (CARDINAL)
- US (GPE)
- one (CARDINAL)
- India (GPE)
- the United States (GPE)
- one (CARDINAL)
- Modi (GPE)
- first (ORDINAL)
- Trump (PERSON)
- November (DATE)
- US (GPE)
- India (GPE)
- New Delhi (GPE)
- 6 to 10 (CARDINAL)
- half (CARDINAL)
- the United States (GPE)
- second (ORDINAL)
- 11 billion (CARDINAL)
- US (GPE)
- Indonesia (GPE)
- Israel (GPE)
- Vietnam (GPE)
- Modis (NORP)
- India (GPE)
- 30 to 60 (CARDINAL)
- third (ORDINAL)
- New Delhi (GPE)
- more than 100 (CARDINAL)
- fourth (ORDINAL)
- March 10 (DATE)
- US (GPE)
- Howard Lutnick (PERSON)
- India (GPE)
- the United States (GPE)
- Sunil Barthwal (PERSON)
- two (CARDINAL)
- Lutnick (GPE)
- India (GPE)
- this year (DATE)
- Trump (ORG)
- Milan Vaishnav (ORG)
- South Asian (NORP)
- the Carnegie Endowment for International Peace (ORG)
- Trump (PERSON)

Entities from article 74: 
- SAN FRANCISCO March (ORG)
- Reuters - Quantum (ORG)
- PsiQuantum (ORG)
- at least 750 million (CARDINAL)
- 6 billion (MONEY)
- two (CARDINAL)
- PsiQuantum (ORG)
- New York (GPE)
- millions (CARDINAL)
- Startups (ORG)
- thousands or millions (MONEY)
- Scientists (NORP)
- recent months (DATE)
- Google Microsoft (ORG)
- Amazon (ORG)
- Last week (DATE)
- Nvidia (PERSON)
- Boston PsiQuantum (ORG)
- Australia (GPE)
- the United States (GPE)
- two (CARDINAL)
- the coming years - one (DATE)
- Brisbane (GPE)
- Australia (GPE)
- Chicago Quantum (ORG)
- decades (DATE)
- recent years (DATE)
- PsiQuantum (ORG)
- 2029 (DATE)
- Google (ORG)
- earlier this year (DATE)
- five years (DATE)

Entities from article 75: 
- NEW DELHI (GPE)
- March 25 (DATE)
- Reuters - India (ORG)
- Samsung (ORG)
- 601 million (CARDINAL)
- one (CARDINAL)
- recent years (DATE)
- last years (DATE)
- 955 million (CARDINAL)
- Samsung (ORG)
- India (GPE)
- 2023 (DATE)
- 10 or 20 (CARDINAL)
- Mukesh Ambanis (ORG)
- Reliance Jio Samsung (PERSON)
- years (DATE)
- January 8 (DATE)
- Reuters Samsung (ORG)
- Indian (NORP)
- Sonal Bajaj (PERSON)
- customs (ORG)
- Samsung (ORG)
- Bajaj (PERSON)
- Samsung (ORG)
- 005930KS (CARDINAL)
- 446 billion rupees (MONEY)
- 520 million (CARDINAL)
- 100 (CARDINAL)
- Seven (CARDINAL)
- India (GPE)
- 81 million (CARDINAL)
- Sung Beam (PERSON)
- Dong Won Chu (PERSON)
- Nikhil Aggarwal Samsungs (ORG)
- Samsung (ORG)
- Indian (NORP)
- Reuters (ORG)
- Reliance (ORG)
- India (GPE)
- Volkswagen (ORG)
- New Delhi (GPE)
- 14 billion (CARDINAL)
- German (NORP)
- India (GPE)
- Samsung (ORG)
- 2021 (DATE)
- Mumbai (GPE)
- Gurugram (ORG)
- New Delhi (GPE)
- Samsung (ORG)
- the Remote Radio Head (ORG)
- one (CARDINAL)
- 4 (CARDINAL)
- 2018 (DATE)
- 2021 (CARDINAL)
- Indian (NORP)
- Samsung (ORG)
- 784 million (CARDINAL)
- Korea (GPE)
- Vietnam (GPE)
- Samsung (ORG)
- Samsung (ORG)
- four (CARDINAL)
- 2020 (CARDINAL)
- Samsung (ORG)
- Indian (NORP)
- Samsung (ORG)

Entities from article 76: 
- SINGAPORE (GPE)
- March 25 (DATE)
- Reuters - Asian (ORG)
- Tuesday (DATE)
- Chinese (NORP)
- US (GPE)
- three-week (DATE)
- US (GPE)
- Donald Trump (PERSON)
- Trump (ORG)
- Monday (DATE)
- April 2 (DATE)
- over two weeks (DATE)
- Nasdaq (ORG)
- 2 (CARDINAL)
- Monday (DATE)
- Asian (NORP)
- Tuesday (DATE)
- morning (TIME)
- mid-afternoon (TIME)
- Asia-Pacific (LOC)
- Japan (GPE)
- 035 (CARDINAL)
- European (NORP)
- European (NORP)
- 024 (CARDINAL)
- 500 (CARDINAL)
- Nasdaq (ORG)
- Charu Chanana (PERSON)
- Saxo (ORG)
- Chanana (PERSON)
- Kyle Rodda (PERSON)
- Capitalcom (ORG)
- Trump (ORG)
- Hong Kongs Hang Seng (ORG)
- 18 (CARDINAL)
- Xiaomis (NORP)
- 55 billion (CARDINAL)
- Xiaomis (NORP)
- 6000 (CARDINAL)
- Steven Leung (PERSON)
- Kay Hian (PERSON)
- Hong Kong (GPE)
- Hang Seng (ORG)
- this year (DATE)
- The Hang Seng (ORG)
- 17 (CARDINAL)
- this year (DATE)
- AI (ORG)
- DeepSeeks (PRODUCT)
- three-week (DATE)
- 15095 (DATE)
- 09 (CARDINAL)
- March 6 (DATE)
- 10781 (DATE)
- US (GPE)
- Data (ORG)
- SP Globals (ORG)
- US (GPE)
- PMI (ORG)
- 535 (CARDINAL)
- this month (DATE)
- 516 (CARDINAL)
- February (DATE)
- 50 (CARDINAL)
- PMI (ORG)
- the first quarter (DATE)
- Indonesian (NORP)
- June 1998 (DATE)
- Asian (NORP)
- next week (DATE)
- Trump (PERSON)
- Tuesday (DATE)
- 1 (CARDINAL)
- Trumps (PERSON)
- Venezuela Brent (ORG)
- 2 cents (MONEY)
- 7302 (DATE)
- US (GPE)
- 6911 (DATE)
- 301587 (DATE)
- Federal Reserve (ORG)
- this year (DATE)

Entities from article 77: 
- March 25 (DATE)
- Reuters - Teslas TSLAO (ORG)
- Europe (LOC)
- year-on-year (DATE)
- February (DATE)
- Tuesday (DATE)
- a second month (DATE)
- EV (ORG)
- European (NORP)
- Elon Musks (ORG)
- BEV (ORG)
- 426 (CARDINAL)
- Europe (LOC)
- the European Automobile Manufacturers Association (ORG)
- Tesla (ORG)
- 18 (CARDINAL)
- 103 (CARDINAL)
- BEV (ORG)
- February (DATE)
- 28 (CARDINAL)
- 216 (CARDINAL)
- last year (DATE)
- fewer than 17000 (CARDINAL)
- the European Union (ORG)
- Britain (GPE)
- European Free Trade Association (ORG)
- 28000 (CARDINAL)
- the same month (DATE)
- 2024 (DATE)
- Tesla (ORG)
- Europe (LOC)
- Model Y (PRODUCT)
- this month (DATE)
- EV (ORG)
- Chinese (NORP)
- Europe (LOC)
- Teslas (PERSON)
- last month (DATE)
- 261 (CARDINAL)
- February 2024 (DATE)
- 31 (CARDINAL)
- ACEA (ORG)
- second (ORDINAL)
- EV (ORG)
- EU (ORG)
- this year (DATE)
- Citi (GPE)
- EU (ORG)
- last week (DATE)
- Tesla (ORG)
- more than half a dozen (CARDINAL)
- European (NORP)
- January (DATE)
- 2024 (DATE)
- Teslas (PERSON)
- EU (ORG)
- EV (ORG)
- Tuesday (DATE)
- three-year (DATE)
- EU (ORG)
- 34 (CARDINAL)
- February (DATE)
- 237 (CARDINAL)
- second (ORDINAL)
- HEV (ORG)
- 19 (CARDINAL)
- PHEV (ORG)
- 584 (CARDINAL)
- February (DATE)
- 482 (CARDINAL)
- a year earlier 2025 (DATE)
- Europes (PERSON)
- Chris Heron (PERSON)
- Reuters (ORG)
- CO2 (PRODUCT)
- Volkswagen VOWGDE (ORG)
- 4 (CARDINAL)
- 108 (CARDINAL)
- a year earlier (DATE)
- EU Britain (GPE)
- European Free Trade Association (ORG)
- February (DATE)
- Stellantis STLAMMI (ORG)
- 162 (CARDINAL)
- SAIC Motor 600104SS (ORG)
- 261 (CARDINAL)
- a year earlier (DATE)
- EU (ORG)
- Chinese (NORP)
- 15 (CARDINAL)
- GEELYUL Volvo (ORG)
- ACEA (ORG)
- BYD (ORG)
- Chinese (NORP)
- 25 (CARDINAL)
- 15 a year (DATE)
- Spain (GPE)
- 11 year-on-year (DATE)
- the month (DATE)
- 64 (CARDINAL)
- Germany (GPE)
- 62 (CARDINAL)
- Italy (GPE)
- 07 (CARDINAL)
- France (GPE)

Entities from article 78: 
- LONDON (GPE)
- March 25 (DATE)
- Reuters - BlackRock (ORG)
- first (ORDINAL)
- Europe (LOC)
- more than 50 billion (MONEY)
- the United States (GPE)
- ETP (ORG)
- BlackRock (ORG)
- Switzerland (GPE)
- Paris (GPE)
- Amsterdam (GPE)
- Frankfurt Reuters (ORG)
- last month (DATE)
- BlackRock (ORG)
- BlackRock (ORG)
- one (CARDINAL)
- first (ORDINAL)
- US (GPE)
- the Securities and Exchange Commission (ORG)
- first (ORDINAL)
- January 2024 (DATE)
- Coinbase (ORG)
- Bank of New York Mellon (ORG)

Entities from article 79: 
- March 25 (DATE)
- Reuters - Chinese (ORG)
- DeepSeek (PRODUCT)
- V3 (CARDINAL)
- US (GPE)
- OpenAI (PERSON)
- DeepSeek-V3-0324 (PRODUCT)
- AI (GPE)
- Hugging Face DeepSeek (WORK_OF_ART)
- AI (GPE)
- recent months (DATE)
- Western (NORP)
- V3 (CARDINAL)
- December (DATE)
- R1 (PRODUCT)
- January (DATE)

Entities from article 80: 
- SEOUL March (ORG)
- 25 (CARDINAL)
- Reuters - South Koreas Hyundai Steel (ORG)
- 58 billion (CARDINAL)
- Hyundai Motor Group (ORG)
- US (GPE)
- Louisiana (GPE)
- annual (DATE)
- 27 million tonnes (QUANTITY)
- Tuesday (DATE)
- Hyundai Steel (ORG)
- more than 5 (CARDINAL)
- 7 (CARDINAL)
- US (GPE)
- Donald Trump (PERSON)
- Hyundai (ORG)
- Steels (ORG)
- US (GPE)
- Hyundai Motor Groups (ORG)
- 21 billion (MONEY)
- the United States (GPE)
- South Korean (NORP)
- Trump (ORG)
- the White House (FAC)
- Monday (DATE)
- Hyundai (ORG)
- US (GPE)
- Hyundai (ORG)
- South Korea (GPE)
- April 2 (DATE)
- South Korea (GPE)
- the United States (ORG)
- Hyundai Motor (ORG)
- Kia Corp 000270KS (ORG)
- Hyundai Motor (ORG)
- 33 (CARDINAL)
- as much as 75 (PERCENT)
- October 2024 (DATE)
- Kia (ORG)
- 21 (CARDINAL)
- Hyundai Steel (ORG)
- billions of dollars (MONEY)
- Hyundai Steel (ORG)
- Lee Tae-hwan (PERSON)
- Daishin Securities Hyundai Steel (ORG)
- half (CARDINAL)
- 2026 (DATE)
- 2029 (DATE)
- Hyundai Steel (ORG)
- Hyundai Motor (ORG)
- Kia Motors (ORG)
- the United States (GPE)

Entities from article 81: 
- WASHINGTON (GPE)
- March 24 (DATE)
- Reuters - US (ORG)
- Donald Trump (PERSON)
- Monday (DATE)
- April 2 (DATE)
- weeks (DATE)
- Trump (PERSON)
- 25 (CARDINAL)
- Venezuela (GPE)
- the White House (FAC)
- April 2 (DATE)
- White House (ORG)
- TBD (ORG)
- Bloomberg (GPE)
- the Wall Street Journal (ORG)
- Trump (ORG)
- weeks (DATE)
- April 2 (DATE)
- US (GPE)
- Monday (DATE)
- next week (DATE)
- 500 (CARDINAL)
- nearly 18 (CARDINAL)
- more than two weeks (DATE)
- Trump (PERSON)
- US (GPE)
- US (GPE)
- the next few days (DATE)
- Trump (ORG)
- the day (DATE)
- Trump (ORG)
- US (GPE)
- Trump (ORG)
- US (GPE)
- Monday (DATE)
- 21 billion (CARDINAL)
- South Koreas Hyundai Motor Group (ORG)
- the United States (GPE)
- 58 billion (CARDINAL)
- Louisiana (GPE)
- the White House (ORG)
- Hyundai (ORG)
- Euisun Chung (PERSON)
- Louisiana (GPE)
- Jeff Landry (PERSON)
- Trump (PERSON)
- April 2 (DATE)
- a Liberation Day (DATE)
- US (GPE)
- 12 trillion (CARDINAL)
- US (GPE)
- Trump (ORG)
- February (DATE)
- 25 (CARDINAL)
- three (CARDINAL)
- US (GPE)
- Trumps (PERSON)
- January (DATE)
- hours (TIME)
- 20 (CARDINAL)
- Chinese (NORP)
- 25 (CARDINAL)
- 25 (CARDINAL)
- Canada (GPE)
- Mexico (GPE)
- North American (NORP)
- US (GPE)
- Two (CARDINAL)
- Trump (ORG)
- Treasury (ORG)
- Scott Bessent (PERSON)
- White House (ORG)
- Kevin Hassett - (PERSON)
- last week (DATE)
- April 2 (DATE)
- Bessent (ORG)
- Dirty (ORG)
- 15 (CARDINAL)
- 15 (CARDINAL)
- Hassett (PERSON)
- Fox Business (ORG)
- 10 (CARDINAL)
- Ryan Majerus (PERSON)
- US Commerce Department (ORG)
- King Spalding (ORG)
- April 2 (DATE)
- Section 232 (LAW)
- early April (DATE)
- UK (GPE)
- India (GPE)
- the White House (ORG)
- second (ORDINAL)
- White House (ORG)
- the Office of the United States Trade Representative (ORG)
- US (GPE)
- USTR (ORG)
- Argentina (GPE)
- Australia (GPE)
- Brazil (GPE)
- Canada (GPE)
- China (GPE)
- the European Union (ORG)
- India (GPE)
- Indonesia (GPE)
- Japan (GPE)
- Korea (GPE)
- Malaysia (GPE)
- Mexico (GPE)
- Russia (GPE)
- Saudi Arabia (GPE)
- South Africa (GPE)
- Switzerland (GPE)
- Taiwan (GPE)
- Thailand (GPE)
- Turkey (GPE)
- Britain (GPE)
- Vietnam (GPE)
- 88 (CARDINAL)
- US (GPE)
- Monday (DATE)
- Venezuela (GPE)
- 25 (CARDINAL)
- the United States (GPE)
- April 2 (DATE)
- Trump (ORG)
- Truth Social (ORG)
- Venezuela (GPE)
- tens of thousands (CARDINAL)
- the United States (GPE)

Entities from article 82: 
- March 25 (DATE)
- Reuters - Oil (ORG)
- Tuesday (DATE)
- a fifth day (DATE)
- US (GPE)
- Venezuelan (NORP)
- Brent (PRODUCT)
- 27 cents (MONEY)
- 0749 (CARDINAL)
- GMT (ORG)
- US (GPE)
- 26 cents (MONEY)
- more than 1 (CARDINAL)
- US (GPE)
- Donald Trump (PERSON)
- 25 (CARDINAL)
- Venezuela Oil (ORG)
- Venezuelas (PERSON)
- China (GPE)
- US (GPE)
- Tuesday (DATE)
- Trump (PERSON)
- US (GPE)
- April 2 (DATE)
- Trumps (PERSON)
- US (GPE)
- Venezuelan (NORP)
- Iranian (NORP)
- Tsuyoshi Ueno (ORG)
- NLI Research Institute (ORG)
- 70 (DATE)
- the rest of the year (DATE)
- US (GPE)
- Last week (DATE)
- US (GPE)
- Iranian (NORP)
- Trump (PERSON)
- Monday (DATE)
- May 27 (DATE)
- US (GPE)
- Chevron CVXN (PERSON)
- Venezuela (GPE)
- about 200000 barrels (QUANTITY)
- ANZ (ORG)
- Trump (ORG)
- April 2 (DATE)
- weeks (DATE)
- OPEC the Organization of the Petroleum Exporting Countries (ORG)
- Russia (GPE)
- a second consecutive month (DATE)
- May four (DATE)
- Reuters (ORG)

Entities from article 83: 
- March 25 (DATE)
- Reuters - China (ORG)
- AI (GPE)
- the United States (GPE)
- just three months (DATE)
- DeepSeek (PRODUCT)
- Chinese (NORP)
- Lee Kai-fu (PERSON)
- Lee (PERSON)
- Google China (LOC)
- Reuters (ORG)
- DeepSeek (PRODUCT)
- China (GPE)
- DeepSeek (PRODUCT)
- AI (GPE)
- January (DATE)
- US (GPE)
- Chinas AI (ORG)
- a six to nine month (DATE)
- three months (DATE)
- Lee (PERSON)
- Hong Kong Washingtons (GPE)
- Chinese (NORP)
- Chinese (NORP)
- DeepSeek (PRODUCT)
- US (GPE)
- Lee (PERSON)
- DeepSeek (PRODUCT)
- OpenAI (ORG)
- Chinas tech (ORG)
- AI (ORG)
- late 2022 (DATE)
- DeepSeeks (PRODUCT)
- Western (NORP)
- Lee (PERSON)
- 01AI (CARDINAL)
- March 2023 (DATE)
- AI (GPE)
- Moonshot (PERSON)
- Chinese (NORP)
- Baidu Alibaba and (ORG)
- ByteDance (ORG)
- Lee (PERSON)
- AI (ORG)
- Earlier this month 01AI (DATE)
- Wanzhi (PRODUCT)
- AI (ORG)
- 2025 (DATE)
- 15 million (CARDINAL)
- last year (DATE)
- Lee (PERSON)

Entities from article 84: 
- March 26 (DATE)
- Reuters - Tokyo Gas (ORG)
- 9531 (DATE)
- the 2026 fiscal year (DATE)
- the United States (GPE)
- Japans (NORP)
- Wednesday (DATE)
- 131 billion yen 871 million (MONEY)
- 2025-26 (DATE)
- fiscal year (DATE)
- April 1 (DATE)
- 72 billion yen (MONEY)
- the year ending this March (DATE)
- 10 yen (MONEY)
- 80 yen (MONEY)
- the current fiscal year (DATE)
- up to 120 billion yen (MONEY)
- the first half of the 2026 fiscal year (DATE)
- Tokyo Gas (ORG)
- the United States (GPE)
- LNG (ORG)
- Singapore (GPE)
- London (GPE)
- over 11 trillion yen (MONEY)
- over 200 billion yen (MONEY)
- fiscal years 2026-2028 (DATE)
- around 100 billion yen (MONEY)
- US (GPE)
- Elliott Management (PERSON)
- 503 (CARDINAL)
- Tokyo Gas (ORG)
- Tokyo Gas (ORG)
- 2 (CARDINAL)
- Tokyo (GPE)

Entities from article 85: 
- LONDON (GPE)
- March 26 (DATE)
- Reuters - European (ORG)
- next year (DATE)
- AI (GPE)
- recent weeks (DATE)
- January (DATE)
- Chinese (NORP)
- DeepSeek (PRODUCT)
- Gen (PERSON)
- AI (ORG)
- RELX (ORG)
- SAP (ORG)
- Chipmaker Nvidia NVDAO (ORG)
- DeepSeeks AI (PRODUCT)
- 29 year (DATE)
- Europe (LOC)
- AI (GPE)
- ASM International ASMIAS (ORG)
- BE Semiconductor BESIAS (ORG)
- 25 (CARDINAL)
- 20 (CARDINAL)
- US (GPE)
- Frances Schneider Electric SCHNPA (PERSON)
- 14 (CARDINAL)
- LSEG LSEGL (PERSON)
- 55 (CARDINAL)
- 16 (CARDINAL)
- German (NORP)
- SAP (ORG)
- 29 (CARDINAL)
- Monday (DATE)
- Novo Nordisk (PERSON)
- AI (GPE)
- Gerry Fowler (PERSON)
- European (NORP)
- UBS With DeepSeek (ORG)
- AI (ORG)
- AI (GPE)
- January (DATE)
- over 100 (CARDINAL)
- Fidelity (ORG)
- almost 72 (CARDINAL)
- AI (GPE)
- 2025 (DATE)
- Fidelity (ORG)
- five years (DATE)
- European (NORP)
- Reuters (ORG)
- AI (GPE)
- Steve Wreford (PERSON)
- Lazard Asset Management Wreford (ORG)
- AI (GPE)
- 2025 (DATE)
- 2026 (DATE)
- 600 (CARDINAL)
- 17 (CARDINAL)
- AI (ORG)
- SAP (ORG)
- LSEG (PERSON)
- LSEG (ORG)
- Bernie Ahkong (PERSON)
- UBS OConnor (ORG)
- the end of 2025 (DATE)
- the year (DATE)
- next quarter (DATE)
- multi-year (DATE)
- Q4 (GPE)
- AI (GPE)
- Paddy Flood (ORG)
- Schroders (ORG)
- Fabio di Giansante (PERSON)
- European (NORP)
- Amundi Europes (ORG)
- European AI (ORG)
- This year (DATE)
- the year (DATE)

Entities from article 86: 
- March 26 (DATE)
- Reuters - Chinese (ORG)
- BYD (ORG)
- China (GPE)
- more than 800000 (CARDINAL)
- 2025 (DATE)
- Tuesday (DATE)
- BYD (ORG)
- 417204 (CARDINAL)
- 2024 (DATE)
- Britain (GPE)
- Chinese (NORP)
- Wang Chuanfu (PERSON)
- Reuters (ORG)
- Latin American (NORP)
- Southeast Asian (NORP)
- Chinese (NORP)
- Chinese (NORP)
- BYD (ORG)
- China (GPE)
- Wang (ORG)
- BYD (ORG)
- Wednesday (DATE)
- BYD (ORG)
- Chinese (NORP)
- Australia (GPE)
- Germany (GPE)
- Wang (ORG)
- last year (DATE)
- BYD (ORG)
- Tuesday (DATE)
- Wang (ORG)
- BYD (ORG)
- Brazil (GPE)
- China (GPE)
- last year (DATE)
- Thailand Hungary (GPE)
- Turkey (GPE)
- Wang (ORG)
- BYD (ORG)
- Canada (GPE)
- the United States (GPE)
- Trump (PERSON)
- 100 (CARDINAL)
- Chinese (NORP)
- Canada Wang (PERSON)
- Toyotas (ORG)
- Japanese (NORP)
- Toyota (ORG)
- 7203 (PRODUCT)
- 108 million (CARDINAL)
- 2024 (DATE)
- BYD (ORG)
- 427 million (CARDINAL)
- BYD (ORG)
- 55 million (CARDINAL)
- this year (DATE)
- Chinese (NORP)
- Seagull (PRODUCT)
- less than 10000 (CARDINAL)
- BYD (ORG)
- as many as 8000 (CARDINAL)
- 5000 (CARDINAL)
- Wang (ORG)
- 2026 (DATE)
- 2027 (DATE)

Entities from article 87: 
- AVALON March 26 (ORG)
- Reuters - Anduril (ORG)
- Christian Brose (PERSON)
- AI (GPE)
- Donald Trump (PERSON)
- Trump (ORG)
- Brose (PERSON)
- Republican (NORP)
- John McCain (PERSON)
- Anduril (PERSON)
- Anduril (ORG)
- December (DATE)
- OpenAI (ORG)
- Brose (PERSON)
- Trump (ORG)
- Trump (ORG)
- the Australian International Air Show (ORG)
- Wednesday (DATE)
- Trump (PERSON)
- Trump (ORG)
- Palmer Luckey (PERSON)
- Pentagon (ORG)
- last month (DATE)
- US (GPE)
- about 50 billion (MONEY)
- 8 (CARDINAL)
- Brose (PERSON)
- Anduril (PERSON)
- Ohio (GPE)
- Brose (PERSON)
- the United States (GPE)
- Australia (GPE)
- The Australian Defence Force (ORG)
- Base Darwin (PERSON)
- US (GPE)
- six months of the year (DATE)
- Australia (GPE)
- the Australian Defence Departments Guided Weapons and Explosive Ordnance Enterprise (ORG)
- Australia (GPE)
- David Goodrich (PERSON)
- Anduril (PERSON)
- Ghost Shark (GPE)
- the Australian Defence Force (ORG)
- Goodrich (PERSON)
- Anduril (PERSON)
- New South Wales (LOC)
- the United States (GPE)
- Britain (GPE)
- Australia (GPE)
- more than A360 billion (MONEY)
- several decades (DATE)
- Australia Brose (ORG)
- Ghost Shark (PERSON)

Entities from article 88: 
- March 26 (DATE)
- Reuters - Chicago Federal Reserve Bank (ORG)
- Austan Goolsbee (PERSON)
- 12-18 months (DATE)
- Financial Times (ORG)
- Goolsbee (PERSON)
- Feds (NORP)
- this year (DATE)
- Fed (ORG)
- 425 (CARDINAL)
- March (DATE)
- later this year (DATE)
- Fed (ORG)
- Jerome Powell (PERSON)
- Donald Trumps (PERSON)
- this year (DATE)
- Feds (NORP)

Entities from article 89: 
- the day ahead (DATE)
- European (NORP)
- Kevin Buckland (PERSON)
- Wednesday (DATE)
- Asia (LOC)
- Donald Trump (PERSON)
- April 2 (DATE)
- that day (DATE)
- the week (DATE)
- 12 (CARDINAL)
- 03 (CARDINAL)
- Tuesday (DATE)
- Japans Nikkei N225 (LAW)
- 1 (CARDINAL)
- more than 1 (CARDINAL)
- 03 just after (TIME)
- noon (TIME)
- US (GPE)
- pan-European (NORP)
- 50 (CARDINAL)
- 01 (CARDINAL)
- Trump (ORG)
- Monday (DATE)
- April 2 (DATE)
- 25 (CARDINAL)
- Venezuelan (NORP)
- Trump (PERSON)
- next Wednesday Liberation Day (DATE)
- the day (DATE)
- one (CARDINAL)
- Trumps (PERSON)
- European (NORP)
- UK (GPE)
- Bank of England (ORG)
- Trumps (PERSON)
- British (NORP)
- Rachel Reeves (PERSON)
- later today (TIME)
- an additional 22 billion pounds 284 billion (MONEY)
- France (GPE)
- Bank of France (ORG)
- Francois Villeroy de Galhau (PERSON)
- US (GPE)
- Fed (ORG)
- Neel Kashkari (PERSON)
- St Louis Fed (ORG)
- Alberto Musalem (PERSON)
- Wednesday (DATE)
- February (DATE)
- March (DATE)
- February -UK (DATE)
- -Minneapolis Feds Kashkari St Louis Feds Musalem (PERSON)
- 1 07733 pounds (QUANTITY)

Entities from article 90: 
- March 26 (DATE)
- Reuters - Tesla (ORG)
- Saudi Arabia (GPE)
- early next month (DATE)
- Elon Musks EV (ORG)
- the Middle East (LOC)
- Saudi Arabia (GPE)
- Gulf (LOC)
- Tesla TSLAO (ORG)
- EV (ORG)
- Europe (LOC)
- the United States (GPE)
- Musk (DATE)
- US (GPE)
- Donald Trump (PERSON)
- Riyadh (GPE)
- April 10 (DATE)
- Teslas (PERSON)
- Cybercab (GPE)
- Optimus (PERSON)
- AI (GPE)
- Teslas (PERSON)
- Europe (LOC)
- this year (DATE)
- EV (ORG)
- 426 (CARDINAL)
- Europe (LOC)
- the European Automobile Manufacturers Association (ORG)
- Tuesday (DATE)
- US (GPE)
- Tesla Takedown (ORG)
- the Department of Government Efficiency (ORG)
- thousands (CARDINAL)
- thousands (CARDINAL)
- The Wall Street Journal (ORG)
- 2023 (DATE)
- Saudi Arabia (GPE)
- Tesla (ORG)
- Lucid Group (ORG)
- one (CARDINAL)
- EV (ORG)
- Tesla (NORP)

Entities from article 91: 
- March 26 (DATE)
- Reuters - Japans (ORG)
- 120-130 (CARDINAL)
- 150 (CARDINAL)
- Reuters (ORG)
- 120 (CARDINAL)
- 130 (CARDINAL)
- Japans (NORP)
- Satsuki Katayama (PERSON)
- Liberal Democratic Partys LDP (ORG)
- Tuesday (DATE)
- Japanese (NORP)
- past 150 (CARDINAL)
- this week (DATE)
- US (GPE)
- US (GPE)
- the Bank of Japan (ORG)
- Japanese (NORP)
- Katayama (PERSON)
- US (GPE)
- Donald Trumps (PERSON)
- Katayama (PERSON)
- LDP (ORG)
- Katayama (PERSON)
- The Nippon Individual Savings Account NISA (ORG)
- 2024 (DATE)
- one (CARDINAL)
- Katayama (PERSON)
- LDP (ORG)
- annual (DATE)
- around June (DATE)

Entities from article 92: 
- March 26 (DATE)
- Reuters (ORG)
- Credit Suisse (ORG)
- Swiss (NORP)
- UBS (ORG)
- 2023 (DATE)
- Credit Suisse (ORG)
- Switzerland (GPE)
- one (CARDINAL)
- UBS (ORG)
- Swiss (NORP)
- Swiss (NORP)
- this month (DATE)
- Credit Suisse (ORG)
- UBS (ORG)
- Sergio Ermotti (PERSON)
- March 19 (DATE)
- UBS (ORG)
- two (CARDINAL)
- Morgan Stanley JPMorgan Goldman Sachs (ORG)
- HSBC (ORG)
- UBS (ORG)
- over 40 billion (MONEY)
- Credit Suisse (ORG)
- 100 (CARDINAL)
- 60 (CARDINAL)
- FINMA (ORG)
- UBS (ORG)
- Reuters (ORG)
- UBS (ORG)
- around 30 (CARDINAL)
- two (CARDINAL)
- about 21 (CARDINAL)
- end-2024 (DATE)
- nearly two-thirds (CARDINAL)
- 2008 (DATE)
- UBS (ORG)
- Swiss (NORP)
- UBS (ORG)
- Credit Suisse (ORG)
- as much as 19 billion (PERCENT)
- UBS (ORG)
- 5 billion (CARDINAL)
- two (CARDINAL)
- FINMA (ORG)
- UBS (ORG)
- UBS (ORG)
- May (DATE)
- UBS (ORG)
- 2028 (DATE)
- UBS (ORG)
- Switzerland (GPE)
- two (CARDINAL)
- Last months (DATE)
- UBS (ORG)
- Franziska Ryser (PERSON)
- Green Party (GPE)
- Reuters Extreme (ORG)

Entities from article 93: 
- March 25 (DATE)
- Reuters - Artificial (ORG)
- California (GPE)
- Tuesday (DATE)
- Universal Music Group (ORG)
- AI (GPE)
- Claude US (ORG)
- Eumi Lee (PERSON)
- Anthropics (PRODUCT)
- Concord (ORG)
- ABKCO (ORG)
- 2023 (DATE)
- at least 500 (CARDINAL)
- Beyonc the Rolling Stones (ORG)
- Claude (PERSON)
- AI (GPE)
- Tech (ORG)
- OpenAI Microsoft (ORG)
- Meta Platforms (ORG)
- US (GPE)
- Fair (ORG)
- Lees (ORG)
- Lee (PERSON)
- Court (ORG)
- AI (GPE)
- Lee (PERSON)

Entities from article 94: 
- WASHINGTON (GPE)
- March 25 (DATE)
- Reuters (ORG)
- US (GPE)
- six (CARDINAL)
- Inspur Group Chinas (ORG)
- dozens (CARDINAL)
- Chinese (NORP)
- Tuesday (DATE)
- Inspur (GPE)
- Chinese (NORP)
- the Commerce Department (ORG)
- Five (CARDINAL)
- China (GPE)
- one (CARDINAL)
- Taiwan Inspur Group (ORG)
- 2023 (DATE)
- Inspur (GPE)
- about 80 (CARDINAL)
- Tuesday (DATE)
- China Others (ORG)
- Taiwan (GPE)
- Iran (GPE)
- Pakistan (GPE)
- South Africa (GPE)
- the United Arab Emirates (GPE)
- Chinas (ORG)
- AI (GPE)
- Chinas (ORG)
- American (NORP)
- American (NORP)
- Commerce (ORG)
- Howard Lutnick Chinas (PERSON)
- Wednesday (DATE)
- US (GPE)
- Chinese (NORP)
- Chinese (NORP)
- Washington (GPE)
- Tuesday (DATE)
- US (GPE)
- The Inspur Group (ORG)
- US (GPE)
- Irans (NORP)
- the Commerce Departments Entity List (ORG)
- Commerce (ORG)
- Jeffrey Kessler (PERSON)
- US (GPE)
- UAVs (ORG)
- Inspur Group (PERSON)
- 2023 (DATE)
- AMD (ORG)
- Nvidia NVDAO (ORG)
- Inspurs (ORG)
- Reuters (ORG)
- US (GPE)
- Nvidia (GPE)
- AMD (ORG)
- Chinese (NORP)
- Nettrix Information Industry Co Suma Technology Co (ORG)
- Suma-USI Electronics (ORG)
- US (GPE)
- Chinese (NORP)
- Sugon (ORG)
- Dawning Information Industry Co 603019SS (ORG)
- the Entity List (LAW)
- 2019 (DATE)
- the Commerce Department (ORG)
- US (GPE)
- Chinas (ORG)
- Huawei (ORG)
- Chinas AI (ORG)
- The Beijing Academy of Artificial Intelligence BAAI (ORG)
- Chinese (NORP)
- US (GPE)
- Wednesday (DATE)
- US (GPE)

Entities from article 95: 
- TAIPEI (ORG)
- March 26 (DATE)
- Reuters - Taiwans central bank (ORG)
- Wednesday (DATE)
- US (GPE)
- Donald Trump (PERSON)
- Washington (GPE)
- Trump (ORG)
- Treasury (ORG)
- Scott Bessent (PERSON)
- April 2 (DATE)
- 15 (CARDINAL)
- Bessent (ORG)
- 15 (CARDINAL)
- US Census Bureau (ORG)
- Taiwan (GPE)
- one (CARDINAL)
- 15 (CARDINAL)
- the United States (GPE)
- China (GPE)
- South Korea (GPE)
- the European Union (ORG)
- Taiwans (NORP)
- last year (DATE)
- 143 (CARDINAL)
- US (GPE)
- Taiwans (NORP)
- the United States (GPE)
- US (GPE)
- Taiwan (GPE)
- the United States (GPE)
- 83 last year (DATE)
- US (GPE)
- Taiwan (GPE)
- Taiwans (NORP)
- the United States (GPE)
- Taiwan (GPE)
- the United States (GPE)
- Taiwan (GPE)
- the US Treasury Department (ORG)
- Trumps (PERSON)

Entities from article 96: 
- March 26 (DATE)
- Reuters - Porsche (ORG)
- Volkswagens (GPE)
- Wednesday (DATE)
- 20-billion-euro (MONEY)
- 2157 billion (CARDINAL)
- Europes (PERSON)
- 191 (CARDINAL)
- 256 (CARDINAL)
- last year (DATE)
- Volkswagen (ORG)
- 15 (CARDINAL)
- last year (DATE)
- Lutz Meschke (PERSON)
- Porsche AG (ORG)
- Porsche (ORG)
- Piech (NORP)
- Volkswagen (ORG)
- Porsche (ORG)
- German (NORP)
- earlier in March (DATE)
- Porsche (ORG)
- Piech (NORP)
- Porsche (ORG)
- Volkswagen (ORG)
- Porsche (ORG)
- 1 09273 (DATE)

Entities from article 97: 
- March 25 (DATE)
- Reuters - Canada (ORG)
- Tesla TSLAO (ORG)
- EV (ORG)
- Transport (ORG)
- Chrystia Freeland (PERSON)
- Tuesday (DATE)
- Freeland (ORG)
- Freeland (ORG)
- Tesla (NORP)
- US (GPE)
- Canada (GPE)
- Reuters (ORG)
- US (GPE)
- Donald Trump (PERSON)
- early April (DATE)
- 25 (CARDINAL)
- Canada (GPE)
- Mexico (GPE)
- Monday (DATE)
- April 2 (DATE)
- Canada (GPE)
- C43 million 3011 million (MONEY)
- Canadian (NORP)
- Mark Carney (PERSON)
- April 28 (DATE)
- the Toronto Star (ORG)
- earlier this month (DATE)
- Tesla (ORG)
- EV (ORG)
- the final days (DATE)
- January (DATE)
- Tesla (NORP)
- Quebec City (GPE)
- more than 4000 (CARDINAL)
- a single weekend (DATE)
- Toronto (GPE)
- Tesla (NORP)
- US (GPE)
- earlier this month (DATE)
- Tesla (PERSON)
- Elon Musk (PERSON)
- Trump (ORG)
- the White House (ORG)
- Department of Government Efficiency (ORG)
- 1 14279 Canadian dollars (PERCENT)

Entities from article 98: 
- NEW DELHI (GPE)
- March 29 (DATE)
- Reuters - Indian (ORG)
- US (GPE)
- several days (DATE)
- New Delhi (GPE)
- Saturday (DATE)
- Indias commerce ministry (ORG)
- US (GPE)
- Brendan Lynch (ORG)
- US (GPE)
- South (LOC)
- Central Asia (LOC)
- March 26-29 (DATE)
- US (GPE)
- State (ORG)
- Christopher Landau (PERSON)
- Indian (NORP)
- Vikram Misri (PERSON)
- Landau (PERSON)
- India (GPE)
- the United States (GPE)
- the US Department of State (ORG)
- US (GPE)
- Donald Trump (PERSON)
- April 2 (DATE)
- India (GPE)
- first (ORDINAL)
- India (GPE)
- US (GPE)
- Indias commerce ministry (ORG)
- Last month (DATE)
- Narendra Modis (PERSON)
- Washington (GPE)
- India (GPE)
- US (GPE)
- two (CARDINAL)
- 500 billion (MONEY)
- 2030 (CARDINAL)
- India (GPE)
- US (GPE)
- India (GPE)
- US (GPE)
- Washington (GPE)
- earlier this month (DATE)
- US (GPE)
- Trade (ORG)
- Jamieson Greer (PERSON)
- Commerce (ORG)
- Howard Lutnick Sector- (PERSON)
- the coming weeks (DATE)
- US (GPE)
- 456 billion (MONEY)
- India (GPE)
- US (GPE)
- about 22 (CARDINAL)
- 12 (CARDINAL)
- World Trade Organization (ORG)

Entities from article 99: 
- March 29 (DATE)
- Reuters (ORG)
- Glass Lewis (PERSON)
- Goldman Sachs (ORG)
- Glass Lewis (PERSON)
- Friday (DATE)
- Glass Lewis (PERSON)
- 160 million (CARDINAL)
- David Solomon (PERSON)
- John Waldron (PERSON)
- January (DATE)
- the additional 160 million (MONEY)
- 2025 (CARDINAL)
- Glass Lewis (PERSON)
- Goldman Sachs (ORG)
- Saturday (DATE)

Entities from article 100: 
- March 28 (DATE)
- Reuters - Private (ORG)
- Blackstone (ORG)
- TikToks US (LOC)
- two (CARDINAL)
- Blackstone (ORG)
- Chinese (NORP)
- ByteDances (ORG)
- non-Chinese (NORP)
- Susquehanna International Group (ORG)
- TikToks US (ORG)
- TikToks US (LOC)
- Chinese (NORP)
- 20 (CARDINAL)
- US (GPE)
- TikTok General Atlantic (ORG)
- Blackstone (ORG)
- Susquehanna (PERSON)
- nearly half (CARDINAL)
- Americans (NORP)
- last year (DATE)
- ByteDance (ORG)
- TikTok (ORG)
- January 19 (DATE)
- TikTok (NORP)
- US (GPE)
- January (DATE)
- the Supreme Court (ORG)
- days later (DATE)
- US (GPE)
- Donald Trump (PERSON)
- April 5 (DATE)
- Trump (ORG)
- China (GPE)
- US (GPE)
- JD Vance (PERSON)
- April (DATE)
- ByteDance (ORG)
- Chinese (NORP)
- US (GPE)
- TikTok (ORG)
- last year (DATE)
- about 58 (CARDINAL)
- ByteDance (ORG)
- Chinese (NORP)
- Zhang Yiming (PERSON)
- 21 (CARDINAL)
- about 7000 (CARDINAL)
- Americans (NORP)
- 21 (CARDINAL)
- The White House (ORG)
- Reuters (ORG)
- January (DATE)
- Trumps (PERSON)
- TikTok (ORG)
- Oracle ORCLN (ORG)
- ByteDance (ORG)

Entities from article 101: 
- MILAN (ORG)
- March 29 (DATE)
- Reuters (ORG)
- Italys (PERSON)
- second (ORDINAL)
- European Central Bank (ORG)
- 14 billion (MONEY)
- Banco BPM BAMIMI (ORG)
- UniCredits (ORG)
- BPM (ORG)
- one (CARDINAL)
- Italian (NORP)
- 2008-2012 (DATE)
- UniCredit (ORG)
- Sunday (DATE)
- Friday (DATE)
- Italian (NORP)
- Consob (ORG)
- the coming week (DATE)
- UniCredit (ORG)
- a month or so (DATE)
- UniCredit (ORG)
- Andrea Orcel (PERSON)
- Germanys Commerzbank CBKGDE (FAC)
- Italian (NORP)
- Generali GASIMI (PERSON)
- UniCredit (ORG)
- Banco BPM (ORG)
- November weeks (DATE)
- Anima Holding (PERSON)
- 18 billion (MONEY)
- Anima (WORK_OF_ART)
- BPM (ORG)
- this week (DATE)
- ECB (ORG)
- BPM (ORG)
- Danish (NORP)
- Anima (ORG)
- Thursday (DATE)
- UniCredit (ORG)
- BPM (ORG)
- Danish Compromise (ORG)
- Friday (DATE)
- Anima (PERSON)
- zero (CARDINAL)

Entities from article 102: 
- March 29 (DATE)
- Reuters - US (ORG)
- Donald Trump (PERSON)
- the Washington Post (ORG)
- Saturday (DATE)
- four (CARDINAL)
- Capitol Hill (ORG)
- Trump (ORG)
- US (GPE)
- Trump (ORG)
- recent days (DATE)
- The White House (ORG)
- Trump (ORG)
- first (ORDINAL)
- Post (ORG)
- the United States (GPE)
- trillions (CARDINAL)
- the Post Earlier (ORG)
- Friday (DATE)
- Trump (PERSON)
- US (GPE)
- April 2 (DATE)

Entities from article 103: 
- March 29 (DATE)
- Reuters - Chinese (ORG)
- Hong Kong (GPE)
- CK (ORG)
- the Panama Canal (FAC)
- Saturday (DATE)
- minutes later (TIME)
- CCTV (ORG)
- China (GPE)
- Hong Kong (GPE)
- CK Hutchison (ORG)
- Chinas national (ORG)
- Yuyuantantian (NORP)
- minutes (TIME)
- China (GPE)
- Friday (DATE)
- Friday (DATE)
- Reuters (ORG)
- CK Hutchison (ORG)

Entities from article 104: 
- TURIN Italy (ORG)
- March 29 (DATE)
- Reuters - Stellantis (ORG)
- Tesla TSLAO (ORG)
- 2025 (DATE)
- meet European Unions CO2 (ORG)
- Brussels (GPE)
- three years (DATE)
- Europe (LOC)
- Saturday (DATE)
- Carmakers (NORP)
- EU (ORG)
- this year (DATE)
- EV (ORG)
- Tesla (ORG)
- second (ORDINAL)
- Tesla (ORG)
- earlier this month (DATE)
- the European Commission (ORG)
- European (NORP)
- 2025-2027 (DATE)
- 2025 (DATE)
- Stellantis (ORG)
- Tesla (ORG)
- this year (DATE)
- European (NORP)
- Jean-Philippe Imparato (ORG)
- Ill (PERSON)
- Imparato (GPE)
- Turin (GPE)
- Stellantis (ORG)
- EV (ORG)
- European (NORP)
- 14 (CARDINAL)
- 21 (CARDINAL)
- EU (ORG)
- 2027 (DATE)
- Imparato (ORG)
- Fiat (ORG)
- 500 (CARDINAL)
- Stellantis Mirafiori (LOC)
- Turin (GPE)
- November (DATE)
- annual (DATE)
- EV (ORG)
- 130000 (CARDINAL)

Entities from article 105: 
- March 28 (DATE)
- Reuters - Artificial (ORG)
- Scale AI (PERSON)
- Business Insider (ORG)
- Friday (DATE)
- AI (GPE)
- Startups (NORP)
- AI (ORG)
- California (GPE)
- Scale (ORG)
- nearly 14 billion (MONEY)
- last year (DATE)
- Reuters (ORG)
- Founded (PERSON)
- 2016 (DATE)
- Scale AI (PERSON)
- Nvidia NVDAO (ORG)
- Amazon (ORG)
- Meta METAO (PERSON)
- The US Department of Labor (ORG)
- Scale AI (ORG)
- the Fair Labor Standards Act (LAW)

Entities from article 106: 
- March 28 (DATE)
- Reuters - US (ORG)
- Donald Trump (PERSON)
- Ozy Media (ORG)
- Carlos Watsons (PERSON)
- 10-year (DATE)
- the White House (ORG)
- Friday (DATE)
- Watson (PERSON)
- last year (DATE)
- Google (ORG)
- Oprah Winfrey Watson (PERSON)
- 116 months (DATE)
- Brooklyn (GPE)
- December (DATE)
- The White House (ORG)
- Trump (PERSON)
- CNBC (ORG)
- Watson (PERSON)
- Trump (PERSON)
- Alice Marie Johnson (PERSON)
- Prosecutors (WORK_OF_ART)
- Watson (PERSON)
- California (GPE)
- Watson (PERSON)
- Founded (PERSON)
- 2013 (DATE)
- 2021 (DATE)
- YouTube (ORG)
- Goldman Sachs (ORG)

Entities from article 107: 
- March 28 (DATE)
- Reuters - Wall Street (ORG)
- Friday (DATE)
- Amazon Microsoft (ORG)
- US (GPE)
- Trump (ORG)
- US (GPE)
- February (DATE)
- 13 months (DATE)
- University of Michigan (ORG)
- 12-month (DATE)
- March (DATE)
- the next year (DATE)
- US (GPE)
- Donald Trump (PERSON)
- January (DATE)
- the Federal Reserve (ORG)
- Apple AAPLO (ORG)
- 27 (CARDINAL)
- Microsoft (ORG)
- 3 (CARDINAL)
- Amazon (ORG)
- 43 One (CARDINAL)
- the coming months (DATE)
- Greg Bassuk (PERSON)
- AXS Investments (ORG)
- New York (GPE)
- 500 (CARDINAL)
- 197 (CARDINAL)
- 558094 (DATE)
- Nasdaq (ORG)
- 270 (CARDINAL)
- 1732299 (DATE)
- 169 (CARDINAL)
- Ten (CARDINAL)
- 11 (CARDINAL)
- 381 (CARDINAL)
- 327 (CARDINAL)
- 76 (CARDINAL)
- Fed (ORG)
- 25 (CARDINAL)
- June (DATE)
- CME FedWatch (ORG)
- Fridays (DATE)
- about 9 (CARDINAL)
- February 19 (DATE)
- Nasdaq (ORG)
- 14 (CARDINAL)
- December 16 (DATE)
- Bob Doll (PERSON)
- Crossmark Investments Part (ORG)
- Ill (GPE)
- 23 (CARDINAL)
- CBOE (ORG)
- almost 3 (CARDINAL)
- one-week (DATE)
- CoreWeaves (ORG)
- nearly 3 (CARDINAL)
- Nvidia (GPE)
- Nasdaq (ORG)
- Friday (DATE)
- Trumps (PERSON)
- 25 (CARDINAL)
- next week (DATE)
- a second day (DATE)
- General Motors GMN (ORG)
- 11 (CARDINAL)
- Ford (ORG)
- 18 (CARDINAL)
- the week (DATE)
- 500 (CARDINAL)
- 15 (CARDINAL)
- Nasdaq (ORG)
- 26 (CARDINAL)
- Dow (ORG)
- about 1 Attention (CARDINAL)
- Trump (PERSON)
- April 2 (DATE)
- Trump (ORG)
- Lululemon Athletica LULUO (PERSON)
- 14 (CARDINAL)
- annual (DATE)
- Gold Fields (ORG)
- 95 (CARDINAL)
- 45 (DATE)
- first (ORDINAL)
- quarterly (DATE)
- six quarters (DATE)
- Nasdaq (ORG)
- quarterly (DATE)
- 2022 (DATE)
- UBS Global Wealth Management (ORG)
- year-end (DATE)
- 500 (CARDINAL)
- 6600 (CARDINAL)
- 52 (CARDINAL)
- 500 (CARDINAL)
- 45 (CARDINAL)
- 500 (CARDINAL)
- 10 (CARDINAL)
- 23 (CARDINAL)
- Nasdaq (ORG)
- 35 (CARDINAL)
- 358 (CARDINAL)
- US (GPE)
- 143 billion (CARDINAL)
- 162 billion (CARDINAL)
- 20 (CARDINAL)

Entities from article 108: 
- March 29 (DATE)
- Reuters - The Trump (ORG)
- French (NORP)
- US (GPE)
- US (GPE)
- European (NORP)
- Certification Regarding Compliance With Applicable (ORG)
- the United States (GPE)
- Europe (LOC)
- Donald Trumps (PERSON)
- America First (EVENT)
- US (GPE)
- France (GPE)
- US (GPE)
- Diversity Equity (ORG)
- France (GPE)
- European (NORP)
- Trump (PERSON)
- DEI (ORG)
- Trumps (PERSON)
- French (NORP)
- Les Echos (FAC)
- first (ORDINAL)
- Friday (DATE)
- US (GPE)
- Paris (GPE)
- 14173 (DATE)
- Restoring Merit (PERSON)
- Opportunities (ORG)
- Trump (PERSON)
- the US Government (ORG)
- French (NORP)
- Le Figaro (FAC)
- English (LANGUAGE)
- five days (DATE)
- Reuters (ORG)
- the United States (GPE)
- Frances (ORG)
- Orange ORANPA (NORP)
- US (GPE)
- Thales TCFPPA (ORG)
- TotalEnergies TTEFPA (ORG)
- US (GPE)
- Orange (NORP)
- French (NORP)
- Eric Lombard (PERSON)
- US (GPE)
- US (GPE)
- US (GPE)
- European (NORP)

Entities from article 109: 
- March 28 (DATE)
- Reuters - Elon Musks xAI (ORG)
- 33 billion (MONEY)
- Grok (PERSON)
- Xs (PERSON)
- Tesla (ORG)
- SpaceX (PERSON)
- Today (DATE)
- 80 billion and (MONEY)
- 33 billion 45B (MONEY)
- Xs (PERSON)
- US (GPE)
- Donald Trump (PERSON)
- the Department of Government Efficiency (ORG)
- Saudi Arabian (NORP)
- Prince Alwaleed bin Talal (PERSON)
- Kingdom Holding (ORG)
- second (ORDINAL)
- Gil Luria (PERSON)
- 45 billion (CARDINAL)
- 1 billion (CARDINAL)
- Twitter (PERSON)
- 2022 (DATE)
- two (CARDINAL)
- Grok (PERSON)
- less than two years ago (DATE)
- 10 billion (MONEY)
- 75 billion (CARDINAL)
- Microsoft (ORG)
- OpenAI (GPE)
- Chinese (NORP)
- DeepSeek (LAW)
- February Musk 53 (DATE)
- 974 billion (MONEY)
- OpenAI (ORG)
- this month (DATE)
- AI (GPE)
- Memphis (GPE)
- Tennessee (GPE)
- Colossus (ORG)
- Grok-3 (WORK_OF_ART)
- February (DATE)
- Twitter Musk (PERSON)
- Trump (ORG)
- seven (CARDINAL)
- 13 billion (CARDINAL)
- two years (DATE)
- last month (DATE)
- AI (GPE)
- Xs (PERSON)
- the previous two quarters (DATE)
- two (CARDINAL)
- Espen Robak (PERSON)
- Pluris Valuation Advisors (ORG)
- US (GPE)
- Friday (DATE)
- Twitter (PERSON)

Entities from article 110: 
- March 28 (DATE)
- Reuters (ORG)
- the end of the year (DATE)
- the full 40 billion (MONEY)
- SoftBank (ORG)
- Friday (DATE)
- 9984 (DATE)
- 20 billion (CARDINAL)
- Microsoft (ORG)
- OpenAI (GPE)
- the end of the year (DATE)
- The Wall Street Journal (ORG)
- first (ORDINAL)
- OpenAI (GPE)
- two-year (DATE)
- AI (ORG)
- OpenAI (PERSON)
- Reuters (ORG)

Entities from article 111: 
- March 28 (DATE)
- Reuters - CoreWeaves (ORG)
- nearly 3 (CARDINAL)
- Nasdaq (ORG)
- Friday (DATE)
- Nvidia (GPE)
- AI (GPE)
- 23 billion (CARDINAL)
- Friday (DATE)
- Nasdaq (ORG)
- 27 (CARDINAL)
- AI (GPE)
- Big Techs (ORG)
- Chinas AI (ORG)
- DeepSeek (PRODUCT)
- 39 (CARDINAL)
- IPO (ORG)
- 40 (CARDINAL)
- CoreWeave (ORG)
- Thursday (DATE)
- Kamran Ansari (PERSON)
- Kapital Ventures (ORG)
- Nvidia NVDAO (ORG)
- 250-million (MONEY)
- CoreWeaves (ORG)
- IPO (ORG)
- 15 billion (MONEY)
- Reuters (ORG)
- Thursday (DATE)
- IPO (ORG)
- AI (GPE)
- Dealogic (PERSON)
- 1995 (DATE)
- Oracle ORCLN (ORG)
- Microsoft (ORG)
- CoreWeave (ORG)
- 13 (CARDINAL)
- 7 this year (DATE)
- CoreWeave (ORG)
- CoreWeave (ORG)
- Mike Intrator (PERSON)
- Reuters Working (ORG)
- Livingston (GPE)
- New Jersey (GPE)
- CoreWeave (ORG)
- Nvidia (ORG)
- AI (ORG)
- CoreWeaves (ORG)
- last year (DATE)
- two (CARDINAL)
- Microsoft (ORG)
- CoreWeaves (ORG)
- Microsoft (ORG)
- AI (ORG)
- OpenAI (PERSON)
- the next several years (DATE)
- Reuters (ORG)
- CoreWeave (PRODUCT)
- five-year (DATE)
- 119 billion (CARDINAL)
- OpenAI (ORG)
- IPO Reuters (ORG)
- first (ORDINAL)
- earlier this month (DATE)
- Microsoft (ORG)
- Founded (PERSON)
- Ethereum (ORG)
- 2017 (DATE)
- AI (GPE)
- a few years later (DATE)
- The Merge Ethereums 2022 (WORK_OF_ART)
- CoreWeaves (ORG)
- more than eight-fold (CARDINAL)
- around 8 billion (MONEY)
- last year (DATE)
- earlier this month (DATE)
- about 1 billion (MONEY)
- IPO (ORG)
- 32 (CARDINAL)
- 26 billion (MONEY)
- IPO (ORG)
- 18 (CARDINAL)
- Morgan Stanley JPMorgan (ORG)
- Goldman Sachs (ORG)

3.1 NER Model Comparison (CRF vs. spaCy)¶

In a full implementation, we would train a CRF model on the CoNLL-2003 dataset and compare it with spaCy's model. For this demonstration, we'll just use the spaCy model.

In [7]:
# Import libraries for model training and evaluation
import logging
import numpy as np
import matplotlib.pyplot as plt
from sklearn_crfsuite import CRF, metrics
from datasets import load_dataset
from seqeval.metrics import classification_report, accuracy_score, f1_score, precision_score, recall_score
import pickle
import os
import spacy

# Initialize NER extractors
spacy_extractor = SpacyNERExtractor()
crf_extractor = CRFExtractor('./output/models/crf_ner_model.pkl')
nlp = spacy.load("en_core_web_sm")

print("Training and comparing NER models...")

# 1. Load the CoNLL-2003 dataset
print("Loading CoNLL-2003 dataset...")
dataset = load_dataset("conll2003", trust_remote_code=True)
train_dataset = dataset['train']
validation_dataset = dataset['validation']
test_dataset = dataset['test']
print(f"Dataset loaded. Train: {len(train_dataset)} examples, Validation: {len(validation_dataset)} examples, Test: {len(test_dataset)} examples")

# Get tag names from train dataset
tag_names = train_dataset.features['ner_tags'].feature.names
print(f"NER tag names: {tag_names}")

# 2. Prepare data for CRF model
print("\nPreparing data for CRF model...")
max_train_samples = None  # You can limit samples
max_val_samples = None
max_test_samples = None

def word2features(tokens, i):
    """
    Extract features for token at position i in the token list.
    This function is specifically designed for the CoNLL dataset token structure.
    """
    token = tokens[i]
    
    features = {
        'bias': 1.0,
        'word': token,
        'word.lower': token.lower(),
        'word.isupper': token.isupper(),
        'word.istitle': token.istitle(),
        'word.isdigit': token.isdigit(),
        'position': i,
        'length': len(token)
    }
    
    # Add prefix and suffix features
    if len(token) > 2:
        features['prefix2'] = token[:2]
        features['suffix2'] = token[-2:]
    if len(token) > 3:
        features['prefix3'] = token[:3]
        features['suffix3'] = token[-3:]
    
    # Add features for token position
    features['is_first'] = i == 0
    features['is_last'] = i == len(tokens) - 1
    
    # Add features for previous and next tokens
    if i > 0:
        prev_token = tokens[i-1]
        features['prev_word'] = prev_token
        features['prev_word.lower'] = prev_token.lower()
        features['prev+word.istitle'] = prev_token.istitle()
    else:
        features['BOS'] = True
    
    if i < len(tokens) - 1:
        next_token = tokens[i+1]
        features['next_word'] = next_token
        features['next_word.lower'] = next_token.lower()
        features['next_word.istitle'] = next_token.istitle()
    else:
        features['EOS'] = True
    
    return features

def sent2features(tokens):
    """Convert a list of tokens to a list of features."""
    return [word2features(tokens, i) for i in range(len(tokens))]

def sent2labels(sent_labels, tag_names):
    """Convert numeric labels to BIO tag strings."""
    return [tag_names[label] for label in sent_labels]

def prepare_data_for_crf(dataset_split, tag_names, max_samples=None):
    """
    Prepare features and labels from a dataset split for CRF training.
    """
    X = []
    y = []
    
    # Use proper indexing to get actual data samples
    sample_indices = range(min(len(dataset_split), max_samples or len(dataset_split)))
    
    for i in sample_indices:
        try:
            # Get the actual sample data by index
            sample = dataset_split[i]
            
            # Process tokens and labels
            tokens = sample['tokens']
            ner_tags = sample['ner_tags']
            
            # Extract features and convert labels
            X.append(sent2features(tokens))
            y.append(sent2labels(ner_tags, tag_names))
            
            # Print progress
            if (i + 1) % 100 == 0:
                print(f"Processed {i + 1}/{len(sample_indices)} samples")
        except Exception as e:
            print(f"Error processing sample {i}: {e}")
    
    return X, y

# Process the dataset
X_train, y_train = prepare_data_for_crf(train_dataset, tag_names, max_train_samples)
X_val, y_val = prepare_data_for_crf(validation_dataset, tag_names, max_val_samples)
X_test, y_test = prepare_data_for_crf(test_dataset, tag_names, max_test_samples)

print(f"Data prepared for CRF: Training set: {len(X_train)} sentences, Validation set: {len(X_val)} sentences, Test set: {len(X_test)} sentences")

# 3. Train CRF model
print("\nTraining CRF model...")
try:
    # Check if we have training data
    if len(X_train) > 0 and len(y_train) > 0:
        crf = CRF(
            algorithm='lbfgs',
            c1=0.1,
            c2=0.1,
            max_iterations=100,
            all_possible_transitions=True
        )
        
        crf.fit(X_train, y_train)
        print("CRF model trained successfully!")
    else:
        print("No training data available. Creating a simple CRF model with synthetic data.")
        # Create a basic CRF model with minimal synthetic data
        crf = CRF(
            algorithm='lbfgs',
            c1=0.1,
            c2=0.1,
            max_iterations=100
        )
        
        # Create synthetic data that mimics CoNLL structure
        X_synthetic = [
            [
                {'bias': 1.0, 'word': 'John', 'word.istitle': True},
                {'bias': 1.0, 'word': 'Smith', 'word.istitle': True},
                {'bias': 1.0, 'word': 'works', 'word.islower': True},
                {'bias': 1.0, 'word': 'at', 'word.islower': True},
                {'bias': 1.0, 'word': 'IBM', 'word.isupper': True},
                {'bias': 1.0, 'word': '.', 'word.ispunct': True}
            ],
            [
                {'bias': 1.0, 'word': 'Mary', 'word.istitle': True},
                {'bias': 1.0, 'word': 'lives', 'word.islower': True},
                {'bias': 1.0, 'word': 'in', 'word.islower': True},
                {'bias': 1.0, 'word': 'New', 'word.istitle': True},
                {'bias': 1.0, 'word': 'York', 'word.istitle': True},
                {'bias': 1.0, 'word': '.', 'word.ispunct': True}
            ]
        ]
        
        y_synthetic = [
            ['B-PER', 'I-PER', 'O', 'O', 'B-ORG', 'O'],
            ['B-PER', 'O', 'O', 'B-LOC', 'I-LOC', 'O']
        ]
        
        crf.fit(X_synthetic, y_synthetic)
        print("CRF model trained with synthetic data.")
        
except Exception as e:
    print(f"Error training CRF model: {e}")
    print("Creating a simple CRF model with synthetic data.")
    # Create a basic CRF model with minimal synthetic data
    crf = CRF(
        algorithm='lbfgs',
        c1=0.1,
        c2=0.1,
        max_iterations=100
    )
    
    # Create synthetic data that mimics CoNLL structure
    X_synthetic = [
        [
            {'bias': 1.0, 'word': 'John', 'word.istitle': True},
            {'bias': 1.0, 'word': 'Smith', 'word.istitle': True},
            {'bias': 1.0, 'word': 'works', 'word.islower': True},
            {'bias': 1.0, 'word': 'at', 'word.islower': True},
            {'bias': 1.0, 'word': 'IBM', 'word.isupper': True},
            {'bias': 1.0, 'word': '.', 'word.ispunct': True}
        ],
        [
            {'bias': 1.0, 'word': 'Mary', 'word.istitle': True},
            {'bias': 1.0, 'word': 'lives', 'word.islower': True},
            {'bias': 1.0, 'word': 'in', 'word.islower': True},
            {'bias': 1.0, 'word': 'New', 'word.istitle': True},
            {'bias': 1.0, 'word': 'York', 'word.istitle': True},
            {'bias': 1.0, 'word': '.', 'word.ispunct': True}
        ]
    ]
    
    y_synthetic = [
        ['B-PER', 'I-PER', 'O', 'O', 'B-ORG', 'O'],
        ['B-PER', 'O', 'O', 'B-LOC', 'I-LOC', 'O']
    ]
    
    crf.fit(X_synthetic, y_synthetic)
    print("CRF model trained with synthetic data.")

# 4. Evaluate CRF model
print("\nEvaluating CRF model...")
if len(X_test) > 0:
    y_pred = crf.predict(X_test)

    print("\nCRF Model Evaluation:")
    # Check if we have any predictions or test data
    if len(y_test) > 0 and len(y_pred) > 0:
        # Get unique labels excluding 'O'
        unique_labels = set()
        for tags in y_test:
            unique_labels.update(tags)
        if 'O' in unique_labels:
            unique_labels.remove('O')
        
        # Check if there are any non-O labels
        if unique_labels:
            try:
                crf_report = metrics.flat_classification_report(
                    y_test, y_pred, 
                    labels=list(unique_labels),
                    zero_division=0
                )
                print(crf_report)
            except Exception as e:
                print(f"Error generating classification report: {e}")
        else:
            print("No entity labels found in test data.")
    else:
        print("No test data or predictions available.")
else:
    print("No test data available for evaluation.")

# 5. Calculate metrics for CRF model
crf_metrics = {}
try:
    if len(X_test) > 0 and len(y_test) > 0 and len(y_pred) > 0:
        # Calculate metrics with zero_division=0 to avoid warnings
        crf_metrics['accuracy'] = accuracy_score(y_test, y_pred)
        crf_metrics['precision'] = precision_score(y_test, y_pred, zero_division=0)
        crf_metrics['recall'] = recall_score(y_test, y_pred, zero_division=0)
        crf_metrics['f1'] = f1_score(y_test, y_pred, zero_division=0)
        
        print(f"CRF Model Metrics:")
        print(f"- Accuracy: {crf_metrics['accuracy']:.4f}")
        print(f"- Precision: {crf_metrics['precision']:.4f}")
        print(f"- Recall: {crf_metrics['recall']:.4f}")
        print(f"- F1 Score: {crf_metrics['f1']:.4f}")
    else:
        print("Cannot calculate CRF metrics: no data available")
        # Use placeholder values for visualization
        crf_metrics = {
            'accuracy': 0.0,
            'precision': 0.0,
            'recall': 0.0,
            'f1': 0.0
        }
except Exception as e:
    print(f"Error calculating CRF metrics: {e}")
    # Use placeholder values for visualization
    crf_metrics = {
        'accuracy': 0.0,
        'precision': 0.0,
        'recall': 0.0,
        'f1': 0.0
    }

# 6. Evaluate spaCy NER model
print("\nEvaluating spaCy NER model...")

def convert_to_bio_tags(text, entities, token_indices):
    """Convert entities to BIO tags for a tokenized text."""
    tags = ['O'] * len(token_indices)
    
    for ent in entities:
        ent_start = ent.start_char
        ent_end = ent.end_char
        ent_type = ent.label_
        
        # Map spaCy entity types to CoNLL format
        if ent_type == 'PERSON':
            ent_type = 'PER'
        elif ent_type == 'ORG':
            ent_type = 'ORG'
        elif ent_type in ['GPE', 'LOC']:
            ent_type = 'LOC'
        else:
            ent_type = 'MISC'
        
        # Find tokens that correspond to this entity
        for i, (start, end) in enumerate(token_indices):
            # If token is within entity boundaries
            if start >= ent_start and start < ent_end:
                if start == ent_start:  # First token of entity
                    tags[i] = f'B-{ent_type}'
                else:  # Continuation of entity
                    tags[i] = f'I-{ent_type}'
    
    return tags

def map_spacy_to_conll(spacy_label):
    """Map spaCy entity types to CoNLL types."""
    mapping = {
        'PERSON': 'PER',
        'ORG': 'ORG',
        'GPE': 'LOC',  # GPE (countries, cities, etc.) maps to LOC in CoNLL
        'LOC': 'LOC',
        'PRODUCT': 'MISC',
        'EVENT': 'MISC',
        'WORK_OF_ART': 'MISC',
        'LANGUAGE': 'MISC',
        'FAC': 'LOC',  # Facilities often map to LOC
        'NORP': 'MISC'  # Nationalities, religious and political groups
    }
    return mapping.get(spacy_label, None)

def evaluate_spacy_ner(test_dataset, tag_names, max_samples=None):
    """Evaluate spaCy NER on the test dataset."""
    true_entities_list = []
    pred_entities_list = []
    
    # Use proper indexing to get actual data samples
    sample_indices = range(min(len(test_dataset), max_samples or len(test_dataset)))
    print(f"Evaluating spaCy on {len(sample_indices)} samples")
    
    for i in sample_indices:
        if i % 1000 == 0:
            print(f"Processing sample {i+1}/{len(sample_indices)}")
        
        try:
            # Get the actual sample data by index
            sample = test_dataset[i]
            
            tokens = sample['tokens']
            text = ' '.join(tokens)
            ner_tags = sample['ner_tags']
            
            # Get ground truth entities
            true_bio = [tag_names[tag] for tag in ner_tags]
            
            # Process with spaCy
            doc = nlp(text)
            
            # Convert spaCy entities to BIO tags
            pred_bio = ['O'] * len(tokens)
            for ent in doc.ents:
                # Map spaCy entity types to CoNLL types
                ent_type = map_spacy_to_conll(ent.label_)
                if not ent_type:
                    continue
                
                # Find token positions that correspond to this entity
                start_token = None
                end_token = None
                curr_pos = 0
                
                for j, token in enumerate(tokens):
                    token_start = curr_pos
                    token_end = token_start + len(token)
                    
                    # Check if this token overlaps with the entity
                    if token_start <= ent.start_char < token_end and start_token is None:
                        start_token = j
                    if token_start < ent.end_char <= token_end:
                        end_token = j + 1
                        break
                    
                    curr_pos = token_end + 1  # +1 for the space
                
                # If we found the entity position, add to BIO tags
                if start_token is not None and end_token is not None:
                    pred_bio[start_token] = f'B-{ent_type}'
                    for j in range(start_token + 1, end_token):
                        pred_bio[j] = f'I-{ent_type}'
            
            true_entities_list.append(true_bio)
            pred_entities_list.append(pred_bio)
                
        except Exception as e:
            logger.error(f"Error evaluating sample {i} with spaCy: {e}")
    
    return true_entities_list, pred_entities_list

# Evaluate spaCy on a small portion of test data
true_entities, spacy_predictions = evaluate_spacy_ner(test_dataset, tag_names, max_test_samples)

# 7. Calculate metrics for spaCy model
spacy_metrics = {}
try:
    if len(true_entities) > 0 and len(spacy_predictions) > 0:
        # Calculate metrics with zero_division=0 to avoid warnings
        spacy_metrics['accuracy'] = accuracy_score(true_entities, spacy_predictions)
        spacy_metrics['precision'] = precision_score(true_entities, spacy_predictions, zero_division=0)
        spacy_metrics['recall'] = recall_score(true_entities, spacy_predictions, zero_division=0)
        spacy_metrics['f1'] = f1_score(true_entities, spacy_predictions, zero_division=0)
        
        print(f"spaCy NER Model Metrics:")
        print(f"- Accuracy: {spacy_metrics['accuracy']:.4f}")
        print(f"- Precision: {spacy_metrics['precision']:.4f}")
        print(f"- Recall: {spacy_metrics['recall']:.4f}")
        print(f"- F1 Score: {spacy_metrics['f1']:.4f}")
    else:
        print("Cannot calculate spaCy metrics: no data available")
        # Use placeholder values for visualization
        spacy_metrics = {
            'accuracy': 0.0,
            'precision': 0.0,
            'recall': 0.0,
            'f1': 0.0
        }
except Exception as e:
    print(f"Error calculating spaCy metrics: {e}")
    # Use placeholder values for visualization
    spacy_metrics = {
        'accuracy': 0.0,
        'precision': 0.0,
        'recall': 0.0,
        'f1': 0.0
    }

# 8. Compare models
print("\nModel Comparison:")
print(f"{'Metric':<10} {'CRF':<10} {'spaCy':<10}")
print(f"{'-'*30}")
print(f"{'Accuracy':<10} {crf_metrics['accuracy']:.4f}     {spacy_metrics['accuracy']:.4f}")
print(f"{'Precision':<10} {crf_metrics['precision']:.4f}     {spacy_metrics['precision']:.4f}")
print(f"{'Recall':<10} {crf_metrics['recall']:.4f}     {spacy_metrics['recall']:.4f}")
print(f"{'F1 Score':<10} {crf_metrics['f1']:.4f}     {spacy_metrics['f1']:.4f}")

# 9. Visualize comparison
metrics_names = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
crf_scores = [crf_metrics['accuracy'], crf_metrics['precision'], crf_metrics['recall'], crf_metrics['f1']]
spacy_scores = [spacy_metrics['accuracy'], spacy_metrics['precision'], spacy_metrics['recall'], spacy_metrics['f1']]

# Set up plot
plt.figure(figsize=(10, 6))
bar_width = 0.35
index = np.arange(len(metrics_names))

# Create bars
plt.bar(index, crf_scores, bar_width, label='CRF')
plt.bar(index + bar_width, spacy_scores, bar_width, label='spaCy')

# Customize plot
plt.xlabel('Metrics')
plt.ylabel('Scores')
plt.title('NER Models Comparison (CRF vs spaCy)')
plt.xticks(index + bar_width / 2, metrics_names)
plt.legend()
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add values on top of bars
for i, v in enumerate(crf_scores):
    plt.text(i, v + 0.01, f'{v:.2f}', ha='center', va='bottom', fontsize=9)

for i, v in enumerate(spacy_scores):
    plt.text(i + bar_width, v + 0.01, f'{v:.2f}', ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.savefig('output/visualization/ner_model_comparison.png')
plt.show()

# 10. Save the trained CRF model
# Create output directory if it doesn't exist
os.makedirs('output/models', exist_ok=True)

# Save CRF model
with open('output/models/crf_ner_model.pkl', 'wb') as f:
    pickle.dump(crf, f)

print("\nCRF model saved to output/models/crf_ner_model.pkl")

# Now we'll update the CRF extractor with our trained model
crf_extractor.model = crf
print("CRF model loaded into extractor")
2025-03-29 16:39:13,366 - src.entity_recognition.ner - INFO - Loaded spaCy model: en_core_web_sm
2025-03-29 16:39:13,775 - src.entity_recognition.ner - INFO - Loaded CRF model from ./output/models/crf_ner_model.pkl
Training and comparing NER models...
Loading CoNLL-2003 dataset...
Dataset loaded. Train: 14041 examples, Validation: 3250 examples, Test: 3453 examples
NER tag names: ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']

Preparing data for CRF model...
Processed 100/14041 samples
Processed 200/14041 samples
Processed 300/14041 samples
Processed 400/14041 samples
Processed 500/14041 samples
Processed 600/14041 samples
Processed 700/14041 samples
Processed 800/14041 samples
Processed 900/14041 samples
Processed 1000/14041 samples
Processed 1100/14041 samples
Processed 1200/14041 samples
Processed 1300/14041 samples
Processed 1400/14041 samples
Processed 1500/14041 samples
Processed 1600/14041 samples
Processed 1700/14041 samples
Processed 1800/14041 samples
Processed 1900/14041 samples
Processed 2000/14041 samples
Processed 2100/14041 samples
Processed 2200/14041 samples
Processed 2300/14041 samples
Processed 2400/14041 samples
Processed 2500/14041 samples
Processed 2600/14041 samples
Processed 2700/14041 samples
Processed 2800/14041 samples
Processed 2900/14041 samples
Processed 3000/14041 samples
Processed 3100/14041 samples
Processed 3200/14041 samples
Processed 3300/14041 samples
Processed 3400/14041 samples
Processed 3500/14041 samples
Processed 3600/14041 samples
Processed 3700/14041 samples
Processed 3800/14041 samples
Processed 3900/14041 samples
Processed 4000/14041 samples
Processed 4100/14041 samples
Processed 4200/14041 samples
Processed 4300/14041 samples
Processed 4400/14041 samples
Processed 4500/14041 samples
Processed 4600/14041 samples
Processed 4700/14041 samples
Processed 4800/14041 samples
Processed 4900/14041 samples
Processed 5000/14041 samples
Processed 5100/14041 samples
Processed 5200/14041 samples
Processed 5300/14041 samples
Processed 5400/14041 samples
Processed 5500/14041 samples
Processed 5600/14041 samples
Processed 5700/14041 samples
Processed 5800/14041 samples
Processed 5900/14041 samples
Processed 6000/14041 samples
Processed 6100/14041 samples
Processed 6200/14041 samples
Processed 6300/14041 samples
Processed 6400/14041 samples
Processed 6500/14041 samples
Processed 6600/14041 samples
Processed 6700/14041 samples
Processed 6800/14041 samples
Processed 6900/14041 samples
Processed 7000/14041 samples
Processed 7100/14041 samples
Processed 7200/14041 samples
Processed 7300/14041 samples
Processed 7400/14041 samples
Processed 7500/14041 samples
Processed 7600/14041 samples
Processed 7700/14041 samples
Processed 7800/14041 samples
Processed 7900/14041 samples
Processed 8000/14041 samples
Processed 8100/14041 samples
Processed 8200/14041 samples
Processed 8300/14041 samples
Processed 8400/14041 samples
Processed 8500/14041 samples
Processed 8600/14041 samples
Processed 8700/14041 samples
Processed 8800/14041 samples
Processed 8900/14041 samples
Processed 9000/14041 samples
Processed 9100/14041 samples
Processed 9200/14041 samples
Processed 9300/14041 samples
Processed 9400/14041 samples
Processed 9500/14041 samples
Processed 9600/14041 samples
Processed 9700/14041 samples
Processed 9800/14041 samples
Processed 9900/14041 samples
Processed 10000/14041 samples
Processed 10100/14041 samples
Processed 10200/14041 samples
Processed 10300/14041 samples
Processed 10400/14041 samples
Processed 10500/14041 samples
Processed 10600/14041 samples
Processed 10700/14041 samples
Processed 10800/14041 samples
Processed 10900/14041 samples
Processed 11000/14041 samples
Processed 11100/14041 samples
Processed 11200/14041 samples
Processed 11300/14041 samples
Processed 11400/14041 samples
Processed 11500/14041 samples
Processed 11600/14041 samples
Processed 11700/14041 samples
Processed 11800/14041 samples
Processed 11900/14041 samples
Processed 12000/14041 samples
Processed 12100/14041 samples
Processed 12200/14041 samples
Processed 12300/14041 samples
Processed 12400/14041 samples
Processed 12500/14041 samples
Processed 12600/14041 samples
Processed 12700/14041 samples
Processed 12800/14041 samples
Processed 12900/14041 samples
Processed 13000/14041 samples
Processed 13100/14041 samples
Processed 13200/14041 samples
Processed 13300/14041 samples
Processed 13400/14041 samples
Processed 13500/14041 samples
Processed 13600/14041 samples
Processed 13700/14041 samples
Processed 13800/14041 samples
Processed 13900/14041 samples
Processed 14000/14041 samples
Processed 100/3250 samples
Processed 200/3250 samples
Processed 300/3250 samples
Processed 400/3250 samples
Processed 500/3250 samples
Processed 600/3250 samples
Processed 700/3250 samples
Processed 800/3250 samples
Processed 900/3250 samples
Processed 1000/3250 samples
Processed 1100/3250 samples
Processed 1200/3250 samples
Processed 1300/3250 samples
Processed 1400/3250 samples
Processed 1500/3250 samples
Processed 1600/3250 samples
Processed 1700/3250 samples
Processed 1800/3250 samples
Processed 1900/3250 samples
Processed 2000/3250 samples
Processed 2100/3250 samples
Processed 2200/3250 samples
Processed 2300/3250 samples
Processed 2400/3250 samples
Processed 2500/3250 samples
Processed 2600/3250 samples
Processed 2700/3250 samples
Processed 2800/3250 samples
Processed 2900/3250 samples
Processed 3000/3250 samples
Processed 3100/3250 samples
Processed 3200/3250 samples
Processed 100/3453 samples
Processed 200/3453 samples
Processed 300/3453 samples
Processed 400/3453 samples
Processed 500/3453 samples
Processed 600/3453 samples
Processed 700/3453 samples
Processed 800/3453 samples
Processed 900/3453 samples
Processed 1000/3453 samples
Processed 1100/3453 samples
Processed 1200/3453 samples
Processed 1300/3453 samples
Processed 1400/3453 samples
Processed 1500/3453 samples
Processed 1600/3453 samples
Processed 1700/3453 samples
Processed 1800/3453 samples
Processed 1900/3453 samples
Processed 2000/3453 samples
Processed 2100/3453 samples
Processed 2200/3453 samples
Processed 2300/3453 samples
Processed 2400/3453 samples
Processed 2500/3453 samples
Processed 2600/3453 samples
Processed 2700/3453 samples
Processed 2800/3453 samples
Processed 2900/3453 samples
Processed 3000/3453 samples
Processed 3100/3453 samples
Processed 3200/3453 samples
Processed 3300/3453 samples
Processed 3400/3453 samples
Data prepared for CRF: Training set: 14041 sentences, Validation set: 3250 sentences, Test set: 3453 sentences

Training CRF model...
CRF model trained successfully!

Evaluating CRF model...

CRF Model Evaluation:
              precision    recall  f1-score   support

       I-LOC       0.81      0.71      0.75       257
       B-ORG       0.81      0.71      0.76      1661
      I-MISC       0.67      0.68      0.67       216
       I-ORG       0.68      0.74      0.71       835
       B-PER       0.84      0.85      0.84      1617
      B-MISC       0.82      0.76      0.79       702
       B-LOC       0.87      0.82      0.84      1668
       I-PER       0.87      0.96      0.91      1156

   micro avg       0.82      0.80      0.81      8112
   macro avg       0.80      0.78      0.79      8112
weighted avg       0.82      0.80      0.81      8112

CRF Model Metrics:
- Accuracy: 0.9562
- Precision: 0.8215
- Recall: 0.7743
- F1 Score: 0.7972

Evaluating spaCy NER model...
Evaluating spaCy on 3453 samples
Processing sample 1/3453
Processing sample 1001/3453
Processing sample 2001/3453
Processing sample 3001/3453
spaCy NER Model Metrics:
- Accuracy: 0.9127
- Precision: 0.6753
- Recall: 0.5565
- F1 Score: 0.6102

Model Comparison:
Metric     CRF        spaCy     
------------------------------
Accuracy   0.9562     0.9127
Precision  0.8215     0.6753
Recall     0.7743     0.5565
F1 Score   0.7972     0.6102
No description has been provided for this image
CRF model saved to output/models/crf_ner_model.pkl
CRF model loaded into extractor

3.2 Extract Entities from Articles¶

Now that we have both models ready, let's extract named entities from our articles using both the spaCy model and our trained CRF model.

In [8]:
# Function to prepare features for the CRF model
def extract_features_from_text(text):
    """Extract features for CRF from text."""
    # Split text into tokens
    tokens = text.split()
    
    # Generate features for each token
    features = []
    for i, token in enumerate(tokens):
        # Basic features
        feature = {
            'bias': 1.0,
            'word': token,
            'word.lower': token.lower(),
            'word.isupper': token.isupper(),
            'word.istitle': token.istitle(),
            'is_first': i == 0,
            'is_last': i == len(tokens) - 1,
            'length': len(token)
        }
        
        # Add prefix/suffix features
        if len(token) > 2:
            feature['suffix2'] = token[-2:]
            feature['prefix2'] = token[:2]
        if len(token) > 3:
            feature['suffix3'] = token[-3:]
            feature['prefix3'] = token[:3]
        
        # Previous token feature (if available)
        if i > 0:
            feature['prev_word'] = tokens[i-1]
            feature['prev_word.lower'] = tokens[i-1].lower()
        else:
            feature['BOS'] = True
        
        # Next token feature (if available)
        if i < len(tokens) - 1:
            feature['next_word'] = tokens[i+1]
            feature['next_word.lower'] = tokens[i+1].lower()
        else:
            feature['EOS'] = True
        
        features.append(feature)
    
    return features

# Function to convert CRF predictions to entity tuples
def convert_crf_predictions_to_entities(text, predictions):
    """Convert CRF predictions to (entity_text, entity_type) tuples."""
    tokens = text.split()
    entities = []
    current_entity = None
    
    for i, (token, tag) in enumerate(zip(tokens, predictions)):
        if tag.startswith('B-'):
            if current_entity:
                entities.append((current_entity['text'], current_entity['type']))
            current_entity = {'text': token, 'type': tag[2:]}
        elif tag.startswith('I-') and current_entity and current_entity['type'] == tag[2:]:
            current_entity['text'] += ' ' + token
        elif tag == 'O':
            if current_entity:
                entities.append((current_entity['text'], current_entity['type']))
                current_entity = None
    
    if current_entity:
        entities.append((current_entity['text'], current_entity['type']))
    
    return entities

# Extract entities using both models
all_spacy_entities = []
all_crf_entities = []

print("\nExtracting entities from articles:")
for i, text in enumerate(cleaned_texts):
    # SpaCy extraction - this is reliable since we're using spaCy's built-in NER
    spacy_entities = spacy_extractor.extract_entities(text)
    all_spacy_entities.append(spacy_entities)
    
    # CRF extraction - use our trained model or fallback to spaCy with mapping
    try:
        # Try to use the CRF model if it's properly trained
        if hasattr(crf_extractor, 'model') and crf_extractor.model is not None:
            # Prepare features
            features = extract_features_from_text(text)
            
            # Make predictions - wrap in try/except in case it fails
            try:
                predictions = crf_extractor.model.predict([features])[0]
                crf_entities = convert_crf_predictions_to_entities(text, predictions)
            except Exception as e:
                print(f"Error predicting with CRF model: {e}")
                # Fallback to spaCy with CoNLL mapping
                doc = nlp(text)
                crf_entities = []
                for ent in doc.ents:
                    if ent.label_ == "PERSON":
                        mapped_type = "PER"
                    elif ent.label_ == "ORG":
                        mapped_type = "ORG"
                    elif ent.label_ in ["GPE", "LOC"]:
                        mapped_type = "LOC"
                    else:
                        mapped_type = "MISC"
                    crf_entities.append((ent.text, mapped_type))
        else:
            # No CRF model, use spaCy with CoNLL mapping
            doc = nlp(text)
            crf_entities = []
            for ent in doc.ents:
                if ent.label_ == "PERSON":
                    mapped_type = "PER"
                elif ent.label_ == "ORG":
                    mapped_type = "ORG"
                elif ent.label_ in ["GPE", "LOC"]:
                    mapped_type = "LOC"
                else:
                    mapped_type = "MISC"
                crf_entities.append((ent.text, mapped_type))
                
        all_crf_entities.append(crf_entities)
        
        # Display results for a few articles
        if i < 3:
            print(f"\nEntities from article {i+1}: {article_titles[i]}")
            print("SpaCy entities:")
            for entity, entity_type in spacy_entities[:10]:  # Show first 10 for brevity
                print(f"- {entity} ({entity_type})")
            if len(spacy_entities) > 10:
                print(f"  ... and {len(spacy_entities) - 10} more")
            
            print("\nCRF entities:")
            for entity, entity_type in crf_entities[:10]:  # Show first 10 for brevity
                print(f"- {entity} ({entity_type})")
            if len(crf_entities) > 10:
                print(f"  ... and {len(crf_entities) - 10} more")
        
        if i == 3:  # After showing detailed output for 3 articles
            print("\nProcessing remaining articles...")
    
    except Exception as e:
        print(f"Error extracting entities from article {i+1}: {e}")
        all_crf_entities.append([])

# Compare entity counts
spacy_entity_count = sum(len(entities) for entities in all_spacy_entities)
crf_entity_count = sum(len(entities) for entities in all_crf_entities)

print(f"\nTotal entities found:")
print(f"- SpaCy: {spacy_entity_count} entities")
print(f"- CRF: {crf_entity_count} entities")

# Visualize entity type distribution
def plot_entity_distribution(all_entities, model_name):
    entity_types = {}
    for article_entities in all_entities:
        for entity, entity_type in article_entities:
            if entity_type not in entity_types:
                entity_types[entity_type] = 0
            entity_types[entity_type] += 1
    
    plt.figure(figsize=(10, 6))
    plt.bar(entity_types.keys(), entity_types.values())
    plt.title(f'Entity Type Distribution - {model_name}')
    plt.xlabel('Entity Type')
    plt.ylabel('Count')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig(f'output/visualization/entity_distribution_{model_name.lower()}.png')
    plt.show()

plot_entity_distribution(all_spacy_entities, "SpaCy")
plot_entity_distribution(all_crf_entities, "CRF")

# For the rest of the pipeline, we'll use a combined approach:
# - For entity types that spaCy handles well (PERSON, ORG, GPE), we'll use spaCy
# - For entity types that CRF handles better (like specific CoNLL categories), we'll use CRF
# - We'll merge the results to get the best of both models

# Merge entities from both models (removing duplicates)
all_merged_entities = []

for i, (spacy_entities, crf_entities) in enumerate(zip(all_spacy_entities, all_crf_entities)):
    # Create a combined set of entities
    seen_entities = set()
    merged_entities = []
    
    # First add spaCy entities (they tend to have better precision)
    for entity, entity_type in spacy_entities:
        entity_key = (entity.lower(), entity_type)
        if entity_key not in seen_entities:
            seen_entities.add(entity_key)
            merged_entities.append((entity, entity_type))
    
    # Then add CRF entities if they don't overlap with spaCy
    for entity, entity_type in crf_entities:
        # Map CoNLL entity types to spaCy types for consistency
        mapped_type = entity_type
        if entity_type == 'PER':
            mapped_type = 'PERSON'
        elif entity_type == 'LOC':
            mapped_type = 'GPE'  # Simplification, could be LOC or GPE
        
        entity_key = (entity.lower(), mapped_type)
        if entity_key not in seen_entities:
            seen_entities.add(entity_key)
            merged_entities.append((entity, mapped_type))
    
    all_merged_entities.append(merged_entities)

# Use the merged entities for the rest of the pipeline
all_entities = all_merged_entities

merged_entity_count = sum(len(entities) for entities in all_merged_entities)
print(f"\nTotal merged entities: {merged_entity_count} entities")

# Plot merged entity distribution
plot_entity_distribution(all_merged_entities, "Merged")
Extracting entities from articles:

Entities from article 1: Apple announces new partnership with Microsoft
SpaCy entities:
- Apple Inc (ORG)
- Microsoft Corporation (ORG)
- Tim Cook (PERSON)
- AI (GPE)
- Cupertino (GPE)
- California (GPE)
- yesterday (DATE)
- Microsoft (ORG)
- Satya Nadella (PERSON)
- iPhone (ORG)
  ... and 3 more

CRF entities:
- Apple Inc (ORG)
- Microsoft Corporation (ORG)
- CEO Tim Cook (ORG)
- Cupertino California (LOC)
- Microsoft CEO Satya Nadella (ORG)
- Apple (ORG)
- Steve Jobs (PER)

Entities from article 2: 
SpaCy entities:
- March 7 (DATE)
- Reuters - Wall Streets (ORG)
- Friday (DATE)
- Fed (ORG)
- Jerome Powell (PERSON)
- Trump (ORG)
- Powell (PERSON)
- the Federal Reserve (ORG)
- Fed (ORG)
- Oliver Pursche (PERSON)
  ... and 65 more

CRF entities:
- Reuters (ORG)
- Wall Streets (LOC)
- Fed Chair Jerome Powell (MISC)
- Trump (MISC)
- Powell (PER)
- Federal Reserve (ORG)
- Fed (ORG)
- Oliver Pursche (PER)
- Wealthspire Advisors (ORG)
- Dow Jones Industrial Average DJI (MISC)
  ... and 24 more

Entities from article 3: 
SpaCy entities:
- WASHINGTON (GPE)
- March 7 (DATE)
- Reuters (ORG)
- US (GPE)
- February (DATE)
- this year (DATE)
- The Labor Departments (ORG)
- Friday (DATE)
- first (ORDINAL)
- Donald Trumps (PERSON)
  ... and 130 more

CRF entities:
- WASHINGTON (LOC)
- Reuters (ORG)
- US (LOC)
- Labor Departments (ORG)
- Donald Trumps (PER)
- Great Recession Economists (ORG)
- Trump (MISC)
- Trumps (LOC)
- Bernard Baumohl (PER)
- Economic Outlook Group Nonfarm (ORG)
  ... and 25 more

Processing remaining articles...

Total entities found:
- SpaCy: 5832 entities
- CRF: 3067 entities
No description has been provided for this image
No description has been provided for this image
Total merged entities: 5616 entities
No description has been provided for this image

4. Relation Extraction¶

Now let's extract relations between the entities we identified.

In [9]:
# Initialize relation extractor
relation_extractor = SpacyRelationExtractor()

# Extract relations
all_relations = []
for i, text in enumerate(cleaned_texts):
    relations = relation_extractor.extract_relations(text)
    
    # Filter out self-relations
    filtered_relations = []
    for subject, predicate, obj in relations:
        # Skip if subject and object are the same entity
        if subject[0] != obj[0]:
            filtered_relations.append((subject, predicate, obj))
    
    all_relations.append(filtered_relations)
    
    print(f"\nRelations from article {i+1}: {article_titles[i]}")
    for subject, predicate, obj in filtered_relations:
        print(f"- {subject[0]} ({subject[1]}) --[{predicate}]--> {obj[0]} ({obj[1]})")
2025-03-29 16:40:13,961 - src.relation_extraction.extractor - INFO - Loaded spaCy model: en_core_web_sm
Relations from article 1: Apple announces new partnership with Microsoft
- Apple (ORG) --[found in]--> 1976 (DATE)
- Cupertino (GPE) --[related_to]--> California (GPE)
- Apple (ORG) --[found_in]--> 1976 (DATE)

Relations from article 2: 

Relations from article 3: 
- The Labor Departments (ORG) --[watch on]--> Friday (DATE)
- The Labor Departments (ORG) --[watch_on]--> Friday (DATE)

Relations from article 4: 
- Adriana Kugler (PERSON) --[say on]--> Friday (DATE)
- Adriana Kugler (PERSON) --[say_on]--> Friday (DATE)

Relations from article 5: 
- Nasdaq (ORG) --[confirm At]--> 255 pm (TIME)
- Richmond (GPE) --[related_to]--> Virginia (GPE)
- Thursday (DATE) --[related_to]--> Trump (PERSON)
- Nasdaq (ORG) --[confirm_At]--> 255 pm (TIME)

Relations from article 6: 
- Reuters - Morgan Stanley (ORG) --[lower on]--> Friday (DATE)
- Goldman Sachs (ORG) --[downgrade to]--> 17 (CARDINAL)
- Reuters - Morgan Stanley (ORG) --[lower_on]--> Friday (DATE)
- Goldman Sachs (ORG) --[downgrade_to]--> 17 (CARDINAL)
- Goldman Sachs (ORG) --[downgrade_from]--> 22 (CARDINAL)

Relations from article 7: 

Relations from article 8: 
- Donald Trump (PERSON) --[take on]--> January 20 (DATE)
- Justin Trudeau (PERSON) --[say on]--> Thursday (DATE)
- Donald Trump (PERSON) --[take_on]--> January 20 (DATE)

Relations from article 9: 
- Siri (PERSON) --[delay until]--> 2026 (DATE)
- Siri (PERSON) --[delay_until]--> 2026 (DATE)

Relations from article 10: 
- The US Trade Representatives Office (ORG) --[hold on]--> Tuesday (DATE)
- The US Trade Representatives Office (ORG) --[hold_on]--> Tuesday (DATE)

Relations from article 11: 

Relations from article 12: 
- Donald Trump (PERSON) --[enact on]--> Tuesday (DATE)
- Donald Trump (PERSON) --[enact_on]--> Tuesday (DATE)

Relations from article 13: 

Relations from article 14: 

Relations from article 15: 
- Reuters (ORG) --[say on]--> Friday (DATE)
- Reuters (ORG) --[say_on]--> Friday (DATE)
- Starlink (ORG) --[unit_of]--> Elon Musks SpaceX (ORG)
- 500000 square miles 13 million square km (QUANTITY) --[km_of]--> US (GPE)

Relations from article 16: 
- David Sacks (PERSON) --[tell at]--> the White House (ORG)
- Richardson (PERSON) --[confirm to]--> Reuters (ORG)
- Tether (PERSON) --[have in]--> US (GPE)
- David Sacks (PERSON) --[tell_at]--> the White House (ORG)
- Richardson (PERSON) --[confirm_to]--> Reuters (ORG)
- Tether (PERSON) --[have_in]--> US (GPE)

Relations from article 17: 
- Richard Bransons Virgin Group (PERSON) --[aim]--> 1 07740 pounds (MONEY)

Relations from article 18: 
- Barrick (PERSON) --[say on]--> January 13 (DATE)
- Reuters (ORG) --[report on]--> February 19 (DATE)
- one (CARDINAL) --[tell]--> Reuters (ORG)
- early November (DATE) --[related_to]--> Reuters (ORG)
- Barrick (PERSON) --[say_on]--> January 13 (DATE)
- Reuters (ORG) --[report_on]--> February 19 (DATE)
- one (CARDINAL) --[tell_in]--> early March (DATE)

Relations from article 19: 
- BOJ (ORG) --[deploy in]--> 2013 (DATE)
- BOJ (ORG) --[has_member]--> Kazuo Ueda (PERSON)
- Kazuo Ueda (PERSON) --[member_of]--> BOJ (ORG)
- January (DATE) --[related_to]--> Kuroda (PERSON)
- BOJ (ORG) --[deploy_in]--> 2013 (DATE)

Relations from article 20: 
- Alexandria (GPE) --[related_to]--> Virginia (GPE)

Relations from article 21: 
- the Labor Department (ORG) --[report on]--> Friday (DATE)
- Christopher Waller (PERSON) --[say on]--> Thursday (DATE)
- the Labor Department (ORG) --[report_on]--> Friday (DATE)
- Christopher Waller (PERSON) --[say_on]--> Thursday (DATE)

Relations from article 22: 
- Nasdaq (ORG) --[register]--> third straight week (DATE)
- Nasdaq (ORG) --[decline]--> 345 (CARDINAL)
- Richmond (GPE) --[related_to]--> Virginia (GPE)
- Thursday (DATE) --[related_to]--> Trump (PERSON)
- Nasdaq (ORG) --[register_since]--> mid-July (DATE)

Relations from article 23: 
- Powell (PERSON) --[become in]--> 2018 (DATE)
- Powell (PERSON) --[become_in]--> 2018 (DATE)

Relations from article 24: 

Relations from article 25: 
- Fed (ORG) --[issue in]--> 2025 (DATE)
- Fed (ORG) --[issue_in]--> 2025 (DATE)

Relations from article 26: 
- Reuters - China (ORG) --[announce on]--> Saturday (DATE)
- Ottawa (GPE) --[introduce in]--> October (DATE)
- the White House (ORG) --[threaten]--> Canada (GPE)
- China (GPE) --[apply to]--> just over 1 billion (MONEY)
- China (GPE) --[remind]--> Canada (GPE)
- Justin Trudeau (PERSON) --[say in]--> August (DATE)
- More than half (CARDINAL) --[go to]--> China (GPE)
- Anthony Albanese (PERSON) --[oust]--> Scott Morrison (PERSON)
- the United States (GPE) --[related_to]--> Canada (GPE)
- Mexico (GPE) --[related_to]--> China (GPE)
- Reuters - China (ORG) --[announce_on]--> Saturday (DATE)
- Ottawa (GPE) --[introduce_in]--> October (DATE)
- China (GPE) --[apply_to]--> just over 1 billion (MONEY)
- Justin Trudeau (PERSON) --[say_in]--> August (DATE)
- More than half (CARDINAL) --[go_to]--> China (GPE)

Relations from article 27: 
- Reuters (ORG) --[drop on]--> Friday (DATE)
- Reuters (ORG) --[drop_on]--> Friday (DATE)

Relations from article 28: 
- Chris Wright (PERSON) --[say on]--> Monday (DATE)
- Chris Wright (PERSON) --[say_on]--> Monday (DATE)

Relations from article 29: 
- Reuters - ArcelorMittal (ORG) --[announce on]--> Monday (DATE)
- Reuters - ArcelorMittal (ORG) --[announce_on]--> Monday (DATE)

Relations from article 30: 
- Bilibili (ORG) --[lose]--> 5 (CARDINAL)
- Sunday (DATE) --[related_to]--> Trump (PERSON)
- Donald Trumps (PERSON) --[comment_over]--> the weekend (DATE)

Relations from article 31: 
- Assura (ORG) --[say on]--> Monday (DATE)
- PHP (ORG) --[have until]--> April 7 (DATE)
- Assura (ORG) --[say_on]--> Monday (DATE)
- PHP (ORG) --[have_until]--> April 7 (DATE)

Relations from article 32: 
- Danielle Smith (PERSON) --[say on]--> Monday (DATE)
- Houston (GPE) --[related_to]--> Canada (GPE)
- Danielle Smith (PERSON) --[say_on]--> Monday (DATE)

Relations from article 33: 
- Reuters - HSBC (ORG) --[downgrade on]--> Monday (DATE)
- Morgan Stanleys Wilson (ORG) --[say on]--> Monday (DATE)
- Global Equity Strategist (ORG) --[related_to]--> Alastair Pinder (PERSON)
- Reuters - HSBC (ORG) --[downgrade_on]--> Monday (DATE)
- Morgan Stanleys Wilson (ORG) --[say_on]--> Monday (DATE)

Relations from article 34: 
- thousands (CARDINAL) --[thousand_in]--> Germany (GPE)

Relations from article 35: 
- Trump (PERSON) --[tell]--> Fox News (ORG)
- Volodymyr Zelenskyy (PERSON) --[travel to]--> Saudi Arabia (GPE)
- Volodymyr Zelenskyy (PERSON) --[travel_to]--> Saudi Arabia (GPE)

Relations from article 36: 
- China (GPE) --[related_to]--> Canada (GPE)

Relations from article 37: 
- Trump (ORG) --[impose from]--> Canada Mexico (GPE)
- USDA (ORG) --[administer]--> hundreds (CARDINAL)
- the White House (ORG) --[say on]--> January 22 (DATE)
- Rollins (PERSON) --[say on]--> February 20 (DATE)
- Walton (PERSON) --[award through]--> Agricultural Marketing Service (ORG)
- Ed (PERSON) --[scrimp for]--> years (DATE)
- The West Virginia Food and Farm Coalition (ORG) --[receive]--> about 80 (CARDINAL)
- Belgrade (GPE) --[related_to]--> Montana (PERSON)
- US (GPE) --[related_to]--> Powell-Palm (ORG)
- Trump (ORG) --[impose_from]--> Canada Mexico (GPE)
- Trump (ORG) --[trump_on]--> March 6 (DATE)
- Trump (ORG) --[trump_in]--> his first days (DATE)
- the White House (ORG) --[say_on]--> January 22 (DATE)
- Rollins (PERSON) --[say_on]--> February 20 (DATE)
- Walton (PERSON) --[award_through]--> Agricultural Marketing Service (ORG)
- Ed (PERSON) --[scrimp_for]--> years (DATE)

Relations from article 38: 
- Ford (ORG) --[cut]--> thousands (CARDINAL)
- Ford (ORG) --[cut_in]--> Europe (LOC)
- Ford (ORG) --[cut_in]--> Germany (GPE)

Relations from article 39: 
- Aramco (ORG) --[invest]--> more than 50 billion (MONEY)

Relations from article 40: 
- Donald Trump (PERSON) --[say ABOARD]--> ABOARD AIR FORCE ONE March 9 (ORG)
- Donald Trump (PERSON) --[say_ABOARD]--> ABOARD AIR FORCE ONE March 9 (ORG)
- Donald Trump (PERSON) --[say_on]--> Sunday (DATE)

Relations from article 41: 
- LONDON (GPE) --[open on]--> Monday (DATE)
- LzLabs (ORG) --[develop after]--> nearly a decade (DATE)
- the High Court (ORG) --[rule with]--> Finola OFarrell (PERSON)
- LzLabs UK (ORG) --[related_to]--> Winsopia (PERSON)
- LONDON (GPE) --[open_on]--> Monday (DATE)
- LzLabs (ORG) --[develop_after]--> nearly a decade (DATE)
- the High Court (ORG) --[rule_with]--> Finola OFarrell (PERSON)

Relations from article 42: 
- US (GPE) --[restore to]--> Ukraine (GPE)
- Bruce Kasman (PERSON) --[tell in]--> Singapore (GPE)
- Hong Kong (GPE) --[related_to]--> HSI (ORG)
- Dicks Sporting Goods (ORG) --[related_to]--> DKSN (ORG)
- US (GPE) --[restore_to]--> Ukraine (GPE)
- Bruce Kasman (PERSON) --[tell_in]--> Singapore (GPE)

Relations from article 43: 
- Reuters (ORG) --[report on]--> Tuesday (DATE)
- Ecarx (GPE) --[generate]--> 70 (CARDINAL)
- Half (CARDINAL) --[come by]--> 2030 (DATE)
- Wednesday (DATE) --[related_to]--> Volkswagen (ORG)
- China (GPE) --[related_to]--> Shen (PERSON)
- Reuters (ORG) --[report_on]--> Tuesday (DATE)
- Half (CARDINAL) --[come_by]--> 2030 (DATE)

Relations from article 44: 
- Intel (ORG) --[report]--> first (ORDINAL)
- YORKTAIPEI (ORG) --[related_to]--> Reuters - TSMC 2330TW (ORG)
- Intel (ORG) --[related_to]--> Gelsinger (PRODUCT)

Relations from article 45: 
- Cathay (ORG) --[buy in]--> 2019 (DATE)
- Cathay (ORG) --[attribute at]--> HK Express (ORG)
- CFO (ORG) --[related_to]--> Rebecca Sharpe (PERSON)
- Air China (ORG) --[related_to]--> 601111SS (CARDINAL)
- Cathay (ORG) --[buy_in]--> 2019 (DATE)
- Cathay (ORG) --[attribute_at]--> HK Express (ORG)

Relations from article 46: 
- Reuters - The European Union (ORG) --[impose from]--> next month (DATE)
- the European Commission (ORG) --[say on]--> Wednesday (DATE)
- von der (PERSON) --[related_to]--> Leyen (GPE)
- Reuters - The European Union (ORG) --[impose_from]--> next month (DATE)
- the European Commission (ORG) --[say_on]--> Wednesday (DATE)

Relations from article 47: 
- Goldman Sachs (ORG) --[hold]--> 19 (CARDINAL)
- BMW (ORG) --[cancel in]--> June of last year (DATE)
- STOCKHOLM (GPE) --[related_to]--> Reuters - Northvolt (ORG)
- BMW (ORG) --[cancel_in]--> June of last year (DATE)

Relations from article 48: 
- Christine Lagarde (PERSON) --[say on]--> Wednesday (DATE)
- Christine Lagarde (PERSON) --[say_on]--> Wednesday (DATE)
- the last few weeks (DATE) --[week_in]--> Lagarde (GPE)

Relations from article 49: 
- Rheinmetall (ORG) --[propose for]--> 2024 (DATE)
- Rheinmetall (ORG) --[propose_for]--> 2024 (DATE)

Relations from article 50: 
- Dick Friend (PERSON) --[buy]--> Tesla TSLAO (ORG)
- Tesla TSLAO (ORG) --[open in]--> 2015 (DATE)
- Australian (NORP) --[related_to]--> Dick Friend (PERSON)
- Tuesday (DATE) --[related_to]--> evening (TIME)
- Tesla TSLAO (ORG) --[open_in]--> 2015 (DATE)
- the same time (DATE) --[time_over]--> the last week (DATE)

Relations from article 51: 

Relations from article 52: 

Relations from article 53: 
- JLR (ORG) --[host In]--> November (DATE)
- Tata (ORG) --[push in]--> January (DATE)
- NEW DELHI (GPE) --[related_to]--> Reuters - Jaguar Land Rover (ORG)
- Britain (GPE) --[related_to]--> Europe (LOC)
- JLR (ORG) --[host_In]--> November (DATE)
- Tata (ORG) --[push_in]--> January (DATE)
- Tata (ORG) --[push_to]--> 2026-2027 (DATE)

Relations from article 54: 
- Bershka (ORG) --[launch in]--> Sweden (GPE)
- China (GPE) --[related_to]--> Mexico (GPE)

Relations from article 55: 

Relations from article 56: 
- the Wall Street Journal (ORG) --[report on]--> Monday (DATE)
- Half (CARDINAL) --[related_to]--> Moon (PERSON)
- Half (CARDINAL) --[related_to]--> Moon Capital (ORG)
- the Wall Street Journal (ORG) --[report_on]--> Monday (DATE)

Relations from article 57: 

Relations from article 58: 
- Raphael Bostic (PERSON) --[say on]--> Monday (DATE)
- He Lifeng (PERSON) --[meet on]--> Sunday (DATE)
- Raphael Bostic (PERSON) --[say_on]--> Monday (DATE)
- He Lifeng (PERSON) --[meet_on]--> Sunday (DATE)

Relations from article 59: 
- Donald Trump (PERSON) --[take on]--> January 20 (DATE)
- Boeing (ORG) --[agree In]--> July (DATE)
- Boeing (ORG) --[win on]--> Friday (DATE)
- Fort Worth (GPE) --[related_to]--> Texas (GPE)
- Donald Trump (PERSON) --[take_on]--> January 20 (DATE)
- Boeing (ORG) --[agree_In]--> July (DATE)
- Boeing (ORG) --[win_on]--> Friday (DATE)

Relations from article 60: 
- Sam Altman (PERSON) --[say on]--> Monday (DATE)
- Lightcap (PERSON) --[join]--> OpenAI (ORG)
- post Altman (ORG) --[say in]--> February (DATE)
- Sam Altman (PERSON) --[say_on]--> Monday (DATE)
- Lightcap (PERSON) --[join_in]--> 2018 (DATE)
- post Altman (ORG) --[say_in]--> February (DATE)

Relations from article 61: 

Relations from article 62: 

Relations from article 63: 

Relations from article 64: 
- Reuters (ORG) --[risk between]--> mid-July (DATE)
- Reuters (ORG) --[risk_between]--> mid-July (DATE)

Relations from article 65: 
- Jeff Landry (PERSON) --[travel to]--> South Korea (GPE)
- Trump (ORG) --[say on]--> Monday (DATE)
- Automakers (GPE) --[lobby]--> the White House (ORG)
- Mary Barra (PERSON) --[meet with]--> Trump (ORG)
- Jeff Landry (PERSON) --[travel_to]--> South Korea (GPE)
- Jeff Landry (PERSON) --[travel_in]--> October (DATE)
- Trump (ORG) --[say_on]--> Monday (DATE)
- Mary Barra (PERSON) --[meet_with]--> Trump (ORG)

Relations from article 66: 
- late-night (TIME) --[related_to]--> Tesla (NORP)

Relations from article 67: 
- Menlo Park California (GPE) --[related_to]--> Robinhood (ORG)
- CFTC (ORG) --[related_to]--> Brian Quintenz (PERSON)
- March 17 came a month (DATE) --[come_after]--> Robinhood (PRODUCT)

Relations from article 68: 
- China (GPE) --[related_to]--> Mexico (GPE)

Relations from article 69: 
- PMI (ORG) --[drop to]--> 498 (CARDINAL)
- PMI (ORG) --[slip to]--> 517 (CARDINAL)
- PMI (ORG) --[rise to]--> 543 (CARDINAL)
- PMI (ORG) --[fall to]--> 508 (CARDINAL)
- PMI (ORG) --[drop_to]--> 498 (CARDINAL)
- PMI (ORG) --[drop_from]--> 527 (CARDINAL)
- PMI (ORG) --[slip_to]--> 517 (CARDINAL)
- PMI (ORG) --[rise_to]--> 543 (CARDINAL)
- PMI (ORG) --[rise_from]--> 510 (CARDINAL)
- PMI (ORG) --[fall_to]--> 508 (CARDINAL)

Relations from article 70: 
- HDB (ORG) --[tell]--> Reuters (ORG)
- Germanys (GPE) --[related_to]--> HDB (ORG)

Relations from article 71: 
- Joe Tsai (PERSON) --[say on]--> Tuesday (DATE)
- Reuters - Alibaba Group (ORG) --[related_to]--> 9988HK (CARDINAL)
- Joe Tsai (PERSON) --[say_on]--> Tuesday (DATE)

Relations from article 72: 

Relations from article 73: 
- Howard Lutnick (PERSON) --[related_to]--> India (GPE)

Relations from article 74: 
- PsiQuantum (ORG) --[raise]--> at least 750 million (CARDINAL)
- Brisbane (GPE) --[related_to]--> Australia (GPE)

Relations from article 75: 
- Reuters - India (ORG) --[order]--> Samsung (ORG)

Relations from article 76: 
- Tuesday (DATE) --[related_to]--> morning (TIME)

Relations from article 77: 
- Reuters - Teslas TSLAO (ORG) --[open in]--> Europe (LOC)
- Tesla (ORG) --[command]--> 18 (CARDINAL)
- PHEV (ORG) --[account for]--> 584 (CARDINAL)
- Chris Heron (PERSON) --[tell]--> Reuters (ORG)
- the European Union (ORG) --[related_to]--> Britain (GPE)
- Reuters - Teslas TSLAO (ORG) --[open_in]--> Europe (LOC)
- Tesla (ORG) --[command_in]--> February (DATE)
- PHEV (ORG) --[account_for]--> 584 (CARDINAL)
- PHEV (ORG) --[account_in]--> February (DATE)

Relations from article 78: 
- the Securities and Exchange Commission (ORG) --[approve in]--> January 2024 (DATE)
- the Securities and Exchange Commission (ORG) --[approve_in]--> January 2024 (DATE)

Relations from article 79: 

Relations from article 80: 
- Hyundai (ORG) --[related_to]--> Steels (ORG)
- Lee Tae-hwan (PERSON) --[analyst_at]--> Daishin Securities Hyundai Steel (ORG)

Relations from article 81: 
- Trump (ORG) --[say for]--> weeks (DATE)
- Trump (ORG) --[announce on]--> Monday (DATE)
- Bessent (ORG) --[refer as]--> 15 (CARDINAL)
- Hassett (PERSON) --[tell]--> Fox Business (ORG)
- Venezuela (GPE) --[send]--> tens of thousands (CARDINAL)
- Jeff Landry (PERSON) --[related_to]--> Trump (PERSON)
- Brazil (GPE) --[related_to]--> China (GPE)
- Canada (GPE) --[related_to]--> China (GPE)
- Trump (ORG) --[say_for]--> weeks (DATE)
- Trump (ORG) --[announce_on]--> Monday (DATE)
- Bessent (ORG) --[refer_as]--> 15 (CARDINAL)
- Venezuela (GPE) --[send_to]--> the United States (GPE)

Relations from article 82: 

Relations from article 83: 
- Reuters - China (ORG) --[narrow to]--> just three months (DATE)
- Lee (PERSON) --[found]--> 01AI (CARDINAL)
- Earlier this month 01AI (DATE) --[launch]--> Wanzhi (PRODUCT)
- Reuters - China (ORG) --[narrow_to]--> just three months (DATE)
- Lee (PERSON) --[found_in]--> March 2023 (DATE)

Relations from article 84: 
- the United States (GPE) --[related_to]--> Japans (NORP)

Relations from article 85: 

Relations from article 86: 
- BYD (ORG) --[build in]--> Brazil (GPE)
- Tuesday (DATE) --[related_to]--> BYD (ORG)
- Wednesday (DATE) --[related_to]--> BYD (ORG)
- BYD (ORG) --[build_in]--> Brazil (GPE)

Relations from article 87: 
- The Australian Defence Force (ORG) --[trialle at]--> Base Darwin (PERSON)
- Anduril (PERSON) --[develop with]--> the Australian Defence Force (ORG)
- The Australian Defence Force (ORG) --[trialle_at]--> Base Darwin (PERSON)
- Anduril (PERSON) --[develop_with]--> the Australian Defence Force (ORG)

Relations from article 88: 

Relations from article 89: 
- the day ahead (DATE) --[day_from]--> Kevin Buckland (PERSON)

Relations from article 90: 
- Reuters - Tesla (ORG) --[launch in]--> Saudi Arabia (GPE)
- The Wall Street Journal (ORG) --[report in]--> 2023 (DATE)
- Reuters - Tesla (ORG) --[launch_in]--> Saudi Arabia (GPE)
- The Wall Street Journal (ORG) --[report_in]--> 2023 (DATE)

Relations from article 91: 

Relations from article 92: 

Relations from article 93: 

Relations from article 94: 
- Five (CARDINAL) --[base in]--> China (GPE)
- one (CARDINAL) --[place in]--> 2023 (DATE)
- The Beijing Academy of Artificial Intelligence BAAI (ORG) --[say on]--> Wednesday (DATE)
- Five (CARDINAL) --[base_in]--> China (GPE)
- one (CARDINAL) --[place_in]--> 2023 (DATE)
- The Beijing Academy of Artificial Intelligence BAAI (ORG) --[say_on]--> Wednesday (DATE)

Relations from article 95: 
- TAIPEI (ORG) --[defend on]--> Wednesday (DATE)
- Bessent (ORG) --[refer as]--> 15 (CARDINAL)
- TAIPEI (ORG) --[defend_on]--> Wednesday (DATE)
- Bessent (ORG) --[refer_as]--> 15 (CARDINAL)

Relations from article 96: 

Relations from article 97: 
- Chrystia Freeland (PERSON) --[say on]--> Tuesday (DATE)
- Canada (GPE) --[freeze]--> C43 million 3011 million (MONEY)
- Mark Carney (PERSON) --[announce on]--> April 28 (DATE)
- Tesla (ORG) --[file in]--> January (DATE)
- Chrystia Freeland (PERSON) --[say_on]--> Tuesday (DATE)
- Mark Carney (PERSON) --[announce_on]--> April 28 (DATE)
- Tesla (ORG) --[file_in]--> January (DATE)

Relations from article 98: 
- Christopher Landau (PERSON) --[speak with]--> Vikram Misri (PERSON)
- Landau (PERSON) --[thank]--> India (GPE)
- Washington (GPE) --[related_to]--> India (GPE)
- Christopher Landau (PERSON) --[speak_with]--> Vikram Misri (PERSON)

Relations from article 99: 
- Friday (DATE) --[related_to]--> Glass Lewis (PERSON)

Relations from article 100: 
- TikTok (NORP) --[go in]--> US (GPE)
- nearly half (CARDINAL) --[half_of]--> Americans (NORP)
- TikTok (NORP) --[go_in]--> US (GPE)
- TikTok (NORP) --[go_in]--> January (DATE)

Relations from article 101: 

Relations from article 102: 
- the Washington Post (ORG) --[report on]--> Saturday (DATE)
- Friday (DATE) --[related_to]--> Trump (PERSON)
- the Washington Post (ORG) --[report_on]--> Saturday (DATE)

Relations from article 103: 
- Friday (DATE) --[related_to]--> Reuters (ORG)

Relations from article 104: 
- Brussels (GPE) --[give]--> three years (DATE)
- Saturday (DATE) --[related_to]--> Carmakers (NORP)

Relations from article 105: 
- The US Department of Labor (ORG) --[investigate]--> Scale AI (ORG)
- Founded (PERSON) --[found_in]--> Scale AI (PERSON)

Relations from article 106: 
- the White House (ORG) --[say on]--> Watson (PERSON)
- Friday (DATE) --[related_to]--> Watson (PERSON)
- CNBC (ORG) --[related_to]--> Watson (PERSON)
- the White House (ORG) --[say_on]--> Watson (PERSON)
- Founded (PERSON) --[found_in]--> 2013 (DATE)

Relations from article 107: 
- 500 (CARDINAL) --[decline]--> 197 (CARDINAL)
- Nasdaq (ORG) --[decline]--> 26 (CARDINAL)
- UBS Global Wealth Management (ORG) --[lower from]--> 6600 (CARDINAL)
- UBS Global Wealth Management (ORG) --[lower_from]--> 6600 (CARDINAL)

Relations from article 108: 

Relations from article 109: 
- Memphis (GPE) --[related_to]--> Tennessee (GPE)

Relations from article 110: 

Relations from article 111: 
- DeepSeek (PRODUCT) --[take on]--> Thursday (DATE)
- CoreWeaves (ORG) --[related_to]--> IPO (ORG)
- DeepSeek (PRODUCT) --[take_on]--> Thursday (DATE)

5. Knowledge Graph Construction¶

Now let's build a knowledge graph from the extracted entities and relations.

In [10]:
# Initialize knowledge graph builder
kg_builder = KnowledgeGraphBuilder(namespace="http://example.org/graphify/")

# Collect all entities and relations
all_kg_entities = set()
all_kg_relations = []

for entities in all_entities:
    for entity in entities:
        all_kg_entities.add(entity)

for relations in all_relations:
    all_kg_relations.extend(relations)

# Add to knowledge graph
kg_builder.add_entities_and_relations(list(all_kg_entities), all_kg_relations)

# Save knowledge graph to file
kg_file = "output/data/knowledge_graph.ttl"
kg_builder.save_to_file(kg_file, format="turtle")
print(f"\nKnowledge graph saved to {kg_file}")

# Display stats
print(f"\nKnowledge Graph Statistics:")
print(f"- Entities: {len(all_kg_entities)}")
print(f"- Relations: {len(all_kg_relations)}")
print(f"- Triples: {len(kg_builder.get_triples())}")
2025-03-29 16:40:23,549 - src.knowledge_graph.builder - INFO - Initialized knowledge graph with namespace: http://example.org/graphify/
2025-03-29 16:40:23,872 - src.knowledge_graph.builder - INFO - Added 3102 entities and 331 relations
2025-03-29 16:40:24,108 - src.knowledge_graph.builder - INFO - Saved knowledge graph to output/data/knowledge_graph.ttl in turtle format
Knowledge graph saved to output/data/knowledge_graph.ttl

Knowledge Graph Statistics:
- Entities: 3102
- Relations: 331
- Triples: 6625

6. Visualization¶

Let's create visualizations of our knowledge graph.

In [11]:
# Create visualizations
# Before visualizing, filter out nodes without relations

# Get all entities that participate in relationships
connected_entities = set()
for relations in all_relations:
    for subject, predicate, obj in relations:
        connected_entities.add(subject[0])  # Add subject text
        connected_entities.add(obj[0])      # Add object text

# Filter the entities to only include those with relations
connected_kg_entities = [entity for entity in all_kg_entities if entity[0] in connected_entities]

# Now use the filtered entities for visualization
kg_builder = KnowledgeGraphBuilder(namespace="http://example.org/graphify/")
kg_builder.add_entities_and_relations(connected_kg_entities, all_kg_relations)

# Static plot
plot_file = "output/visualization/knowledge_graph_plot.png"
kg_builder.plot_graph(plot_file)
print(f"Static plot saved to {plot_file}")

# Interactive visualization
vis_file = "output/visualization/knowledge_graph_interactive.html"
kg_builder.visualize(vis_file)
print(f"Interactive visualization saved to {vis_file}")

# Display the static plot in the notebook
from IPython.display import Image
Image(filename=plot_file)
2025-03-29 16:40:24,126 - src.knowledge_graph.builder - INFO - Initialized knowledge graph with namespace: http://example.org/graphify/
2025-03-29 16:40:24,166 - src.knowledge_graph.builder - INFO - Added 354 entities and 331 relations
2025-03-29 16:40:27,293 - src.knowledge_graph.builder - INFO - Saved plot to output/visualization/knowledge_graph_plot.png
2025-03-29 16:40:27,361 - src.knowledge_graph.builder - INFO - Saved visualization to output/visualization/knowledge_graph_interactive.html
Static plot saved to output/visualization/knowledge_graph_plot.png
Interactive visualization saved to output/visualization/knowledge_graph_interactive.html
Out[11]:
No description has been provided for this image
No description has been provided for this image

7. Querying the Knowledge Graph¶

Let's run some SPARQL queries on our knowledge graph.

In [ ]:
# Example query: Find all organizations and their locations
query1 = """
PREFIX ns: <http://example.org/graphify/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?org ?loc
WHERE {
    ?org_uri rdf:type ns:ORG .
    ?loc_uri rdf:type ns:GPE .
    ?org_uri ?pred ?loc_uri .
    ?org_uri <http://www.w3.org/2000/01/rdf-schema#label> ?org .
    ?loc_uri <http://www.w3.org/2000/01/rdf-schema#label> ?loc .
}
"""

results1 = kg_builder.query_sparql(query1)
print("Organizations and their locations:")
for row in results1:
    print(f"- {row[0]} is located in {row[1]}")
Organizations and their locations:
- the European Union is located in Britain
- Reuters - Tesla is located in Saudi Arabia
- Trump is located in Canada Mexico
- the White House is located in Canada
- Bershka is located in Sweden
- Ford is located in Germany
- BYD is located in Brazil

8. Exporting Results for Further Analysis¶

Let's export the entities and relations to CSV files for further analysis.

In [14]:
# Export entities to CSV
entity_data = []
for entity_text, entity_type in all_kg_entities:
    entity_data.append({'entity': entity_text, 'type': entity_type})

entity_df = pd.DataFrame(entity_data)
entity_csv = "output/data/entities.csv"
entity_df.to_csv(entity_csv, index=False)
print(f"Entities exported to {entity_csv}")

# Export relations to CSV
relation_data = []
for subject, predicate, obj in all_kg_relations:
    relation_data.append({
        'subject': subject[0],
        'subject_type': subject[1],
        'predicate': predicate,
        'object': obj[0],
        'object_type': obj[1]
    })

relation_df = pd.DataFrame(relation_data)
relation_csv = "output/data/relations.csv"
relation_df.to_csv(relation_csv, index=False)
print(f"Relations exported to {relation_csv}")
Entities exported to output/data/entities.csv
Relations exported to output/data/relations.csv

9. Conclusion¶

In this notebook, we've demonstrated the complete pipeline for building a knowledge graph from raw text data:

  1. We collected articles using a web scraper
  2. We preprocessed and cleaned the text
  3. We extracted named entities using spaCy
  4. We extracted relations between entities
  5. We built a knowledge graph from the entities and relations
  6. We visualized and queried the knowledge graph

This pipeline can be extended by:

  • Training a CRF model for NER and comparing it with spaCy's performance
  • Implementing more sophisticated relation extraction methods
  • Adding more data sources
  • Expanding the knowledge graph with additional entity types and relations
  • Developing applications that use the knowledge graph for search, recommendation, or other tasks